[PATCH v2] net/memif: fix buffer overflow in zero copy Rx

Ferruh Yigit ferruh.yigit at amd.com
Sun Jul 7 16:05:35 CEST 2024


On 7/7/2024 6:50 AM, Mihai Brodschi wrote:
> Hi Ferruh,
> 
> On 07/07/2024 05:12, Ferruh Yigit wrote:
>> On 6/28/2024 10:01 PM, Mihai Brodschi wrote:
>>> rte_pktmbuf_alloc_bulk is called by the zero-copy receiver to allocate
>>> new mbufs to be provided to the sender. The allocated mbuf pointers
>>> are stored in a ring, but the alloc function doesn't implement index
>>> wrap-around, so it writes past the end of the array. This results in
>>> memory corruption and duplicate mbufs being received.
>>>
>>
>> Hi Mihai,
>>
>> I am not sure writing past the ring actually occurs.
>>
>> As far as I can see is to keep the ring full as much as possible, when
>> initially 'head' and 'tail' are 0, it fills all ring.
>> Later tails moves and emptied space filled again. So head (in modulo) is
>> always just behind tail after refill. In next run, refill will only fill
>> the part tail moved, and this is calculated by 'n_slots'. As this is
>> only the size of the gap, starting from 'head' (with modulo) shouldn't
>> pass the ring length.
>>
>> Do you observe this issue practically? If so can you please provide your
>> backtrace and numbers that is showing how to reproduce the issue?
> 
> The alloc function writes starting from the ring's head, but the ring's
> head can be located at the end of the ring's memory buffer (ring_size - 1).
> The correct behavior would be to wrap around to the start of the buffer (0),
> but the alloc function has no awareness of the fact that it's writing to a
> ring, so it writes to ring_size, ring_size + 1, etc.
> 
> Let's look at the existing code:
> We assume the ring size is 256 and we just received 32 packets.
> The previous tail was at index 255, now it's at index 31.
> The head is initially at index 255.
> 
> head = __atomic_load_n(&ring->head, __ATOMIC_RELAXED);	// head = 255
> n_slots = ring_size - head + mq->last_tail;		// n_slots = 32
> 
> if (n_slots < 32)					// not taken
> 	goto no_free_mbufs;
> 
> ret = rte_pktmbuf_alloc_bulk(mq->mempool, &mq->buffers[head & mask], n_slots);
> // This will write 32 mbuf pointers starting at index (head & mask) = 255.
> // The ring size is 256, so apart from the first one all pointers will be
> // written out of bounds (index 256 .. 286, when it should be 0 .. 30).
> 

My expectation is numbers should be like following:

Initially:
 size = 256
 head = 0
 tail = 0

In first refill:
 n_slots = 256
 head = 256
 tail = 0

Subsequent run that 32 slots used:
 head = 256
 tail = 32
 n_slots = 32
 rte_pktmbuf_alloc_bulk(mq, buf[head & mask], n_slots);
  head & mask = 0
  // So it fills first 32 elements of buffer, which is inbound

This will continue as above, combination of only gap filled and head
masked with 'mask' provides the wrapping required.


> I can reproduce a crash 100% of the time with my application, but the output
> is not very helpful, since it crashes elsewhere because of mempool corruption.
> Applying this patch fixes the crashes completely.
> 

This causing always reproducible crash means existing memif zero copy Rx
is broken and nobody can use it, but I am suspicions that this is the
case, perhaps something special in your usecase triggering this issue.

@Jakup, can you please confirm that memif Rx zero copy is tested?

>>> Allocate 2x the space for the mbuf ring, so that the alloc function
>>> has a contiguous array to write to, then copy the excess entries
>>> to the start of the array.
>>>
>>
>> Even issue is valid, I am not sure about solution to double to buffer
>> memory, but lets confirm the issue first before discussing the solution.
> 
> Initially, I thought about splitting the call to rte_pktmbuf_alloc_bulk in two,
> but I thought that might be bad for performance if the mempool is being used
> concurrently from multiple threads.
> 
> If we want to use only one call to rte_pktmbuf_alloc_bulk, we need an array
> to store the allocated mbuf pointers. This array must be of length ring_size,
> since that's the maximum amount of mbufs which may be allocated in one go.
> We need to copy the pointers from this array to the ring.
> 
> If we instead allocate twice the space for the ring, we can skip copying
> the pointers which were written to the ring, and only copy those that were
> written outside of its bounds.
> 

First thing comes my mind was also using two 'rte_pktmbuf_alloc_bulk()'
calls.
I can see why you prefer doubling the buffer size, but it comes with
copying overhead.
So both options comes with some overhead, not sure which one is better,
although I am leaning to the first solution we should do some
measurements to decide.

BUT first lets agree on the problem first, before doing more work, I am
not still fully convinced that original code is wrong.

>>> Fixes: 43b815d88188 ("net/memif: support zero-copy slave")
>>> Cc: stable at dpdk.org
>>> Signed-off-by: Mihai Brodschi <mihai.brodschi at broadcom.com>
>>> ---
>>> v2:
>>>  - fix email formatting
>>>
>>> ---
>>>  drivers/net/memif/rte_eth_memif.c | 10 +++++++++-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
>>> index 16da22b5c6..3491c53cf1 100644
>>> --- a/drivers/net/memif/rte_eth_memif.c
>>> +++ b/drivers/net/memif/rte_eth_memif.c
>>> @@ -600,6 +600,10 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>>  	ret = rte_pktmbuf_alloc_bulk(mq->mempool, &mq->buffers[head & mask], n_slots);
>>>  	if (unlikely(ret < 0))
>>>  		goto no_free_mbufs;
>>> +	if (unlikely(n_slots > ring_size - (head & mask))) {
>>> +		rte_memcpy(mq->buffers, &mq->buffers[ring_size],
>>> +			(n_slots + (head & mask) - ring_size) * sizeof(struct rte_mbuf *));
>>> +	}
>>>  
>>>  	while (n_slots--) {
>>>  		s0 = head++ & mask;
>>> @@ -1245,8 +1249,12 @@ memif_init_queues(struct rte_eth_dev *dev)
>>>  		}
>>>  		mq->buffers = NULL;
>>>  		if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) {
>>> +			/*
>>> +			 * Allocate 2x ring_size to reserve a contiguous array for
>>> +			 * rte_pktmbuf_alloc_bulk (to store allocated mbufs).
>>> +			 */
>>>  			mq->buffers = rte_zmalloc("bufs", sizeof(struct rte_mbuf *) *
>>> -						  (1 << mq->log2_ring_size), 0);
>>> +						  (1 << (mq->log2_ring_size + 1)), 0);
>>>  			if (mq->buffers == NULL)
>>>  				return -ENOMEM;
>>>  		}
>>
> 
> Apologies for sending this multiple times, I'm not familiar with mailing lists.
> 
> 



More information about the dev mailing list