[dpdk-dev] Adding API to force freeing consumed buffers in TX ring

Wiles, Keith keith.wiles at intel.com
Tue Nov 22 21:41:29 CET 2016


> On Nov 21, 2016, at 9:25 AM, Richardson, Bruce <bruce.richardson at intel.com> wrote:
> 
> On Mon, Nov 21, 2016 at 04:06:32PM +0100, Olivier Matz wrote:
>> Hi,
>> 
>> On 11/21/2016 03:33 PM, Wiles, Keith wrote:
>>> 
>>>> On Nov 21, 2016, at 4:48 AM, Damjan Marion (damarion) <damarion at cisco.com> wrote:
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> Currently in VPP we do memcpy of whole packet when we need to do 
>>>> replication as we cannot know if specific buffer is transmitted
>>>> from tx ring before we update it again (i.e. l2 header rewrite).
>>>> 
>>>> Unless there is already a way to address this issue in DPDK which I’m not aware
>>>> of my proposal is that we provide mechanism for polling TX ring 
>>>> for consumed buffers. This can be either completely new API or 
>>>> extension of rte_etx_tx_burst (i.e. special case when nb_pkts=0).
>>>> 
>>>> This will allows us to start polling tx ring when we expect some 
>>>> mbuf back, instead of waiting for next tx burst (which we don’t know
>>>> when it will happen) and hoping that we will reach free_threshold soon.
>>> 
>>> +1
>>> 
>>> In Pktgen I have the problem of not being able to reclaim all of the TX mbufs to update them for the next set of packets to send. I know this is not a common case, but I do see the case where the application needs its mbufs freed off the TX ring. Currently you need to have at least a TX ring size of mbufs on hand to make sure you can send to a TX ring. If you allocate too few you run into a deadlock case as the number of mbufs  on a TX ring does not hit the flush mark. If you are sending to multiple TX rings on the same numa node from the a single TX pool you have to understand the total number of mbufs you need to have allocated to hit the TX flush on each ring. Not a clean way to handle the problems as you may have limited memory or require some logic to add more mbufs for dynamic ports.
>>> 
>>> Anyway it would be great to require a way to clean up the TX done ring, using nb_pkts == 0 is the simplest way, but a new API is fine too.
>>>> 
>>>> Any thoughts?
>> 
>> Yes, it looks useful to have a such API.
>> 
>> I would prefer another function instead of diverting the meaning of
>> nb_pkts. Maybe this?
>> 
>>  void rte_eth_tx_free_bufs(uint8_t port_id, uint16_t queue_id);
>> 
> 
> Third parameter for a limit(hint) of the number of bufs to free? If the
> TX ring is big, we might not want to stall other work for a long time
> while we free a huge number of buffers.

In order to move this along some, if we create the following API:

int rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt);

Return the number of freed mbufs or -1 if not supported or invalid params.
free_cnt of zero means free all possible mbufs or just at most the number suggested.
The free_cnt could be a uint16_t, but I do not think it matters much.

The rte_eth_tx_done_cleanup() call will return -1 if the PMD does not support or port_id, queue_id are invalid.

The default in the eth_dev structure of function pointers would be NULL(not supported) to not require all of the drivers to be updated today. We can then add the support as we go along.

We could have a features request API for tx_done support and PCTYPE, plus others if we want to go down that path too.

> 
> 	/Bruce

Regards,
Keith



More information about the dev mailing list