[dpdk-dev] dmadev discussion summary

Ananyev, Konstantin konstantin.ananyev at intel.com
Mon Jun 28 13:14:31 CEST 2021
Previous message (by thread): [dpdk-dev] dmadev discussion summary
Next message (by thread): [dpdk-dev] dmadev discussion summary
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi everyone,

> On Sat, Jun 26, 2021 at 11:59:49AM +0800, fengchengwen wrote:
> > Hi, all
> >   I analyzed the current DPAM DMA driver and drew this summary in conjunction
> > with the previous discussion, and this will as a basis for the V2 implementation.
> >   Feedback is welcome, thanks
> >
> Fantastic review and summary, many thanks for the work. Some comments
> inline in API part below, but nothing too major, I hope.
> 
> /Bruce
> 
> <snip>
> >
> > Summary:
> >   1) The dpaa2/octeontx2/Kunpeng are all ARM soc, there may acts as endpoint of
> >      x86 host (e.g. smart NIC), multiple memory transfer requirements may exist,
> >      e.g. local-to-host/local-to-host..., from the point of view of API design,
> >      I think we should adopt a similar 'channel' or 'virt-queue' concept.
> >   2) Whether to create a separate dmadev for each HW-queue? We previously
> >      discussed this, and due HW-queue could indepent management (like
> >      Kunpeng_dma and Intel DSA), we prefer create a separate dmadev for each
> >      HW-queue before. But I'm not sure if that's the case with dpaa. I think
> >      that can be left to the specific driver, no restriction is imposed on the
> >      framework API layer.
> >   3) I think we could setup following abstraction at dmadev device:
> >       ------------    ------------
> >       |virt-queue|    |virt-queue|
> >       ------------    ------------
> >              \           /
> >               \         /
> >                \       /
> >              ------------     ------------
> >              | HW-queue |     | HW-queue |
> >              ------------     ------------
> >                     \            /
> >                      \          /
> >                       \        /
> >                         dmadev
> >   4) The driver's ops design (here we only list key points):
> >      [dev_info_get]: mainly return the number of HW-queues
> >      [dev_configure]: nothing important
> >      [queue_setup]: create one virt-queue, has following main parameters:
> >          HW-queue-index: the HW-queue index used
> >          nb_desc: the number of HW descriptors
> >          opaque: driver's specific info
> >          Note1: this API return virt-queue index which will used in later API.
> >                 If user want create multiple virt-queue one the same HW-queue,
> >                 they could achieved by call queue_setup with the same
> >                 HW-queue-index.
> >          Note2: I think it's hard to define queue_setup config paramter, and
> >                 also this is control API, so I think it's OK to use opaque
> >                 pointer to implement it.
> I'm not sure opaque pointer will work in practice, so I think we should try
> and standardize the parameters as much as possible. Since it's a control
> plane API, using a struct with a superset of parameters may be workable.
> Let's start with a minimum set and build up from there.
> 
> >       [dma_copy/memset/sg]: all has vq_id input parameter.
> >          Note: I notice dpaa can't support single and sg in one virt-queue, and
> >                I think it's maybe software implement policy other than HW
> >                restriction because virt-queue could share the same HW-queue.
> Presumably for queues which support sq, the single-enqueue APIs can use a
> single sg list internally?
> 
> >       Here we use vq_id to tackle different scenario, like local-to-local/
> >       local-to-host and etc.
> >   5) And the dmadev public data-plane API (just prototype):
> >      dma_cookie_t rte_dmadev_memset(dev, vq_id, pattern, dst, len, flags)
> >        -- flags: used as an extended parameter, it could be uint32_t
> 
> Suggest uint64_t rather than uint32_t to ensure we have expansion room?
> Otherwise +1
> 
> >      dma_cookie_t rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags)
> +1
> 
> >      dma_cookie_t rte_dmadev_memcpy_sg(dev, vq_id, sg, sg_len, flags)
> >        -- sg: struct dma_scatterlist array
> I don't think our drivers will be directly implementing this API, but so
> long as SG support is listed as a capability flag I'm fine with this as an
> API. [We can't fudge it as a bunch of single copies, because that would
> cause us to have multiple cookies rather than one]
> 
> >      uint16_t rte_dmadev_completed(dev, vq_id, dma_cookie_t *cookie,
> >                                    uint16_t nb_cpls, bool *has_error)
> >        -- nb_cpls: indicate max process operations number
> >        -- has_error: indicate if there is an error
> >        -- return value: the number of successful completed operations.
> >        -- example:
> >           1) If there are already 32 completed ops, and 4th is error, and
> >              nb_cpls is 32, then the ret will be 3(because 1/2/3th is OK), and
> >              has_error will be true.
> >           2) If there are already 32 completed ops, and all successful
> >              completed, then the ret will be min(32, nb_cpls), and has_error
> >              will be false.
> >           3) If there are already 32 completed ops, and all failed completed,
> >              then the ret will be 0, and has_error will be true.
> +1 for this
> 
> >      uint16_t rte_dmadev_completed_status(dev_id, vq_id, dma_cookie_t *cookie,
> >                                           uint16_t nb_status, uint32_t *status)
> >        -- return value: the number of failed completed operations.
> >      And here I agree with Morten: we should design API which adapts to DPDK
> >      service scenarios. So we don't support some sound-cards DMA, and 2D memory
> >      copy which mainly used in video scenarios.
> 
> Can I suggest a few adjustments here to the semantics of this API. In
> future we may have operations which return a status value, e.g. our
> hardware can support ops like compare equal/not-equal, which means that
> this API would be meaningful even in case of success. Therefore, I suggest
> that the return value be changed to allow success also to be returned in
> the array, and the return value is not the number of failed ops, but the
> number of ops for which status is being returned.
> 
> Also for consideration: when trying to implement this in a prototype in our
> driver, it would be easier if we relax the restriction on the "completed"
> API so that we can flag has_error when an error is detected rather than
> guaranteeing to return all elements right up to the error. For example, if
> we have a burst of packets and one is problematic, it may be easier to flag
> the error at the start of the burst and then have a few successful entries
> at the start of the completed_status array. [Separate from this] We should
> also have a "has_error" or "more_errors" flag on this API too, to indicate
> when the user can switch back to using the regular "completed" API. This
> means that apps switch from one API to the other when "has_error" is true,
> and only switch back when it becomes false again.
> 
> >   6) The dma_cookie_t is signed int type, when <0 it mean error, it's
> >      monotonically increasing base on HW-queue (other than virt-queue). The
> >      driver needs to make sure this because the damdev framework don't manage
> >      the dma_cookie's creation.
> +1 to this.
> I think we also should specify that the cookie is guaranteed to wrap at a
> power of 2 value (UINT16_MAX??). This allows it to be used as an
> index into a circular buffer just by masking.
> 
> >   7) Because data-plane APIs are not thread-safe, and user could determine
> >      virt-queue to HW-queue's map (at the queue-setup stage), so it is user's
> >      duty to ensure thread-safe.
> >   8) One example:
> >      vq_id = rte_dmadev_queue_setup(dev, config.{HW-queue-index=x, opaque});
> >      if (vq_id < 0) {
> >         // create virt-queue failed
> >         return;
> >      }
> >      // submit memcpy task
> >      cookit = rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags);
> >      if (cookie < 0) {
> >         // submit failed
> >         return;
> >      }
> >      // get complete task
> >      ret = rte_dmadev_completed(dev, vq_id, &cookie, 1, has_error);
> >      if (!has_error && ret == 1) {
> >         // the memcpy successful complete
> >      }
> +1

I have two questions on the proposed API:
1. Would it make sense to split submission API into two stages:
    a) reserve and prepare
    b) actual submit.
Similar to what DPDK ioat/idxd PMDs have right now:
/* reserve and prepare */
 for (i=0;i<num;i++) {cookie = rte_dmadev_memcpy(...);}
/* submit to HW */
rte_dmadev_issue_pending(...);

For those PMDs that prefer to do actual submission to HW at rte_dmadev_memcpy(),
issue_pending()  will be just a NOP.

As I can it will make API more flexible and will help PMD developers to choose
most suitable approach for their HW.  
As a side notice - linux DMA framework uses such approach too.

2) I wonder what would be MT-safe requirements for submit/completion API?
I.E. should all PMD support the case when one thread does rte_dmadev_memcpy(..)
while another one does  rte_dmadev_completed(...) on the same queue simultaneously?
Or should such combination be ST only?
Or might be new capability flag per device?

> 
> >   9) As octeontx2_dma support sg-list which has many valid buffers in
> >      dpi_dma_buf_ptr_s, it could call the rte_dmadev_memcpy_sg API.
> >   10) As ioat, it could delcare support one HW-queue at dev_configure stage, and
> >       only support create one virt-queue.
> +1
> 
> >   11) As dpaa2_qdma, I think it could migrate to new framework, but still wait
> >       for dpaa2_qdma guys feedback.
> >   12) About the prototype src/dst parameters of rte_dmadev_memcpy API, we have
> >       two candidates which are iova and void *, how about introduce dma_addr_t
> >       type which could be va or iova ?
> >
> 
> Many thanks again.
Previous message (by thread): [dpdk-dev] dmadev discussion summary
Next message (by thread): [dpdk-dev] dmadev discussion summary
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list