[dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring

Adrien Mazarguil adrien.mazarguil at 6wind.com
Fri Dec 23 10:45:04 CET 2016


Hi Billy,

On Tue, Dec 20, 2016 at 09:15:50AM -0500, Billy McFall wrote:
> Thank you for your responses, see inline.
> 
> On Tue, Dec 20, 2016 at 7:58 AM, Adrien Mazarguil
> <adrien.mazarguil at 6wind.com> wrote:
> > On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
> >>
> >>
> >> > -----Original Message-----
> >> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil
> >> > Sent: Tuesday, December 20, 2016 11:28 AM
> >> > To: Billy McFall <bmcfall at redhat.com>
> >> > Cc: thomas.monjalon at 6wind.com; Lu, Wenzhuo <wenzhuo.lu at intel.com>; dev at dpdk.org; Stephen Hemminger
> >> > <stephen at networkplumber.org>
> >> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
> >> >
> >> > Hi Billy,
> >> >
> >> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> >> > > Add a new API to force free consumed buffers on TX ring. API will return
> >> > > the number of packets freed (0-n) or error code if feature not supported
> >> > > (-ENOTSUP) or input invalid (-ENODEV).
> >> > >
> >> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> >> > > in local buffer, the API also accepts *buffer and *sent. Before
> >> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> >> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> >> > > if threshold is not met.
> >> > >
> >> > > Signed-off-by: Billy McFall <bmcfall at redhat.com>
> >> > > ---
> >> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
> >> > >  1 file changed, 56 insertions(+)
> >> > >
> >> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> >> > > index 9678179..e3f2be4 100644
> >> > > --- a/lib/librte_ether/rte_ethdev.h
> >> > > +++ b/lib/librte_ether/rte_ethdev.h
> >> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
> >> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
> >> > >  /**< @internal Check DD bit of specific RX descriptor */
> >> > >
> >> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> >> > > +/**< @internal Force mbufs to be from TX ring. */
> >> > > +
> >> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
> >> > >   uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
> >> > >
> >> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
> >> > >   eth_rx_disable_intr_t      rx_queue_intr_disable;
> >> > >   eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
> >> > >   eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> >> > > + eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> >> > >   eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
> >> > >   eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
> >> > >   flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> >> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
> >> > >  }
> >> > >
> >> > >  /**
> >> > > + * Request the driver to free mbufs currently cached by the driver. The
> >> > > + * driver will only free the mbuf if it is no longer in use.
> >> > > + *
> >> > > + * @param port_id
> >> > > + *   The port identifier of the Ethernet device.
> >> > > + * @param queue_id
> >> > > + *   The index of the transmit queue through which output packets must be
> >> > > + *   sent.
> >> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> >> > > + *   to rte_eth_dev_configure().
> >> > > + * @param free_cnt
> >> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> >> > > + *   should be freed. Note that a packet may be using multiple mbufs.
> >> > > + * @param buffer
> >> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
> >> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> >> > > + *   if buffer has already been flushed.
> >> > > + * @param sent
> >> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
> >> > > + *   If *buffer is supplied, *sent must also be supplied.
> >> > > + * @return
> >> > > + *   Failure: < 0
> >> > > + *     -ENODEV: Invalid interface
> >> > > + *     -ENOTSUP: Driver does not support function
> >> > > + *   Success: >= 0
> >> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
> >> > > + *     are in use.
> >> > > + */
> >> > > +
> >> > > +static inline int
> >> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> >> > > +         struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> >> > > +{
> >> > > + struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >> > > +
> >> > > + /* Validate Input Data. Bail if not valid or not supported. */
> >> > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >> > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> >> > > +
> >> > > + /*
> >> > > +  * If transmit buffer is provided and there are still packets to be
> >> > > +  * sent, then send them before attempting to free pending mbufs.
> >> > > +  */
> >> > > + if (buffer && sent)
> >> > > +         *sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >> > > +
> >> > > + /* Call driver to free pending mbufs. */
> >> > > + return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> >> > > +                 free_cnt);
> >> > > +}
> >> > > +
> >> > > +/**
> >> > >   * Configure a callback for buffered packets which cannot be sent
> >> > >   *
> >> > >   * Register a specific callback to be called when an attempt is made to send
> >> >
> 
> I will remove the buffer/sent parameters. It will be the applications
> responsibility
> to make sure rte_eth_tx_buffer_flush() is called.
> 
> I don't feel strongly about the free_cnt parameter. It was in the
> original request
> so that if there was a large ring buffer, the API could bail early
> without having
> to go through all the entire ring. It might be a little unrealistic
> for the application
> to truly know how many mbufs it wants freed. Also, as an example, the I40e
> driver already has a i40e_tx_free_bufs(...) function, so by dropping
> the free_cnt
> parameter, this function could be reused without having to account for
> the free_cnt.
> 
> >> > Just a thought to follow-up on Stephen's comment to further simplify this
> >> > API, how about not adding any new eth_dev_ops but instead defining what
> >> > should happen during an empty TX burst call (tx_burst() with 0 packets).
> >> >
> 
> In the original API request thread, see dpdk-dev mailing list from 11/21/2016
> with subject "Adding API to force freeing consumed buffers in TX ring",
> overloading the existing API with nb_pkts == 0 was suggested and consensus
> was to go with new API. I lean towards a new API since this is a special case
> most applications won't use, but I will go with the community on whether to
> enhance the existing burst functionality or add a new API.

OK, I've just read the original thread.

> >> > Several PMDs already have a check for this scenario and start by cleaning up
> >> > completed packets anyway, they effectively partially implement this
> >> > definition for free already.
> >>
> >> Many PMDs  start by cleaning up only when number of free entries
> >> drop below some point.
> 
> True, but the original request for this API was for the scenario where packets
> are being flooded and the application wanted to reuse mbuf to avoid a packet
> copy. So the API was to request the driver to free "done" mbufs outside of any
> threshold.

Understood, so it's more than just a polite suggestion to PMDs that
implement this call. In my opinion it's still better to avoid adding a new
callback for that purpose since applications cannot rely on a specific
outcome, it cannot guarantee any mbuf would be freed, not unlike calling
tx_burst() with 0 packets.

That's a separate discussion, however perhaps making struct eth_dev_ops part
of the public API was not such a good idea after all. We're unable to
maintain ABI compatibility across releases because of it.

New callbacks would be met with less resistance (at least on my side) if
this whole ABI compat thing was not an issue.

> >> Also in that case the author would have to modify (and test) all existing TX routinies.
> >> So I think a separate API call seems more plausible.
> >
> > Not necessarily, as I understand this API in its current form only suggests
> > that a PMD should release a few mbufs from a queue if possible, without any
> > guarantee, PMDs are not forced to comply.
> >
> > I think the threshold you mention is a valid reason not to release them, and
> > it wouldn't change a thing to existing tx_burst() implementations in the
> > meantime (only documentation).
> >
> > This threshold could also be bypassed rather painlessly in the
> > "if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
> > way or another.
> >
> >> Though I am agree with previous comment from Stephen that last two parameters
> >> are redundant and would just overcomplicate things.
> >> tin
> >>
> >> >
> >> > The main difference with this API would be that you wouldn't know how many
> >> > mbufs were freed and wouldn't collect them into an array. However most
> >> > applications have one mbuf pool and/or know where they come from, so they
> >> > can just query the pool or attempt to re-allocate from it after doing empty
> >> > bursts in case of starvation.
> >> >
> >> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
> >
> > --
> > Adrien Mazarguil
> > 6WIND

-- 
Adrien Mazarguil
6WIND


More information about the dev mailing list