[dpdk-dev] [PATCH v2] dmadev: introduce DMA device library

Jerin Jacob jerinjacobk at gmail.com
Tue Jul 13 10:59:09 CEST 2021


On Mon, Jul 12, 2021 at 10:30 PM Bruce Richardson
<bruce.richardson at intel.com> wrote:
>
> On Mon, Jul 12, 2021 at 10:04:07PM +0530, Jerin Jacob wrote:
> > On Mon, Jul 12, 2021 at 7:02 PM Bruce Richardson
> > <bruce.richardson at intel.com> wrote:
> > >
> > > On Mon, Jul 12, 2021 at 03:29:27PM +0530, Jerin Jacob wrote:
> > > > On Sun, Jul 11, 2021 at 2:59 PM Chengwen Feng <fengchengwen at huawei.com> wrote:
> > > > >
> > > > > This patch introduce 'dmadevice' which is a generic type of DMA
> > > > > device.
> > > > >
> > > > > The APIs of dmadev library exposes some generic operations which can
> > > > > enable configuration and I/O with the DMA devices.
> > > > >
> > > > > Signed-off-by: Chengwen Feng <fengchengwen at huawei.com>
> > > > > ---
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > > > + *
> > > > > + * Enqueue a fill operation onto the virtual DMA channel.
> > > > > + *
> > > > > + * This queues up a fill operation to be performed by hardware, but does not
> > > > > + * trigger hardware to begin that operation.
> > > > > + *
> > > > > + * @param dev_id
> > > > > + *   The identifier of the device.
> > > > > + * @param vchan
> > > > > + *   The identifier of virtual DMA channel.
> > > > > + * @param pattern
> > > > > + *   The pattern to populate the destination buffer with.
> > > > > + * @param dst
> > > > > + *   The address of the destination buffer.
> > > > > + * @param length
> > > > > + *   The length of the destination buffer.
> > > > > + * @param flags
> > > > > + *   An flags for this operation.
> > > > > + *
> > > > > + * @return
> > > > > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > > >
> > > > fill job
> > > >
> > > > > + *   - <0: Error code returned by the driver copy function.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static inline int
> > > > > +rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> > > > > +               rte_iova_t dst, uint32_t length, uint64_t flags)
> > > > > +{
> > > > > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > > > > +#ifdef RTE_DMADEV_DEBUG
> > > > > +       RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > > > > +       RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> > > > > +       if (vchan >= dev->data->dev_conf.max_vchans) {
> > > > > +               RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > > > > +               return -EINVAL;
> > > > > +       }
> > > > > +#endif
> > > > > +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> > > > > +}
> > > > > +
> > > >
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > > > + *
> > > > > + * Returns the number of operations that have been successfully completed.
> > > > > + *
> > > > > + * @param dev_id
> > > > > + *   The identifier of the device.
> > > > > + * @param vchan
> > > > > + *   The identifier of virtual DMA channel.
> > > > > + * @param nb_cpls
> > > > > + *   The maximum number of completed operations that can be processed.
> > > > > + * @param[out] last_idx
> > > > > + *   The last completed operation's index.
> > > > > + *   If not required, NULL can be passed in.
> > > >
> > > > This means the driver will be tracking the last index.
> > > >
> > >
> > > Driver will be doing this anyway, no, since it needs to ensure we don't
> >
> > Yes.
> >
> > > wrap around?
> >
> >
> > >
> > > > Is that mean, the application needs to call this API periodically to
> > > > consume the completion slot.
> > > > I.e up to 64K (UINT16_MAX)  outstanding jobs are possible. If the
> > > > application fails to call this
> > > > >64K outstand job then the subsequence enqueue will fail.
> > >
> > > Well, given that there will be a regular enqueue ring which will probably
> > > be <= 64k in size, the completion call will need to be called frequently
> > > anyway. I don't think we need to document this restriction as it's fairly
> > > understood that you can't go beyond the size of the ring without cleanup.
> >
> >
> > See below.
> >
> > >
> > > >
> > > > If so, we need to document this.
> > > >
> > > > One of the concerns of keeping UINT16_MAX as the limit is the
> > > > completion memory will always not in cache.
> > > > On the other hand, if we make this size programmable. it may introduce
> > > > complexity in the application.
> > > >
> > > > Thoughts?
> > >
> > > The reason for using powers-of-2 sizes, e.g. 0 .. UINT16_MAX, is that the
> > > ring can be any other power-of-2 size and we can index it just by masking.
> > > In the sample app for dmadev, I expect the ring size used to be set the
> > > same as the dmadev enqueue ring size, for simplicity.
> >
> > No question on not using power of 2. Aligned on that.
> >
> > At least in our HW, the size of the ring is rte_dmadev_vchan_conf::nb_desc.
> > But completion happens in _different_ memory space. Currently, we are allocating
> > UINT16_MAX entries to hold that. That's where cache miss aspects of
> > completion aspects
> > came.
>
> Depending on HW, our completions can be written back to a separate memory
> area - a completion ring, if you will - but I've generally found it works
> as well to reuse the enqueue ring for that purpose. However, with a
> separate memory area for completions, why do you need to allocate 64K
> entries for the completions? Would nb_desc entries not be enough? Is that
> to allow the user to have more than nb_desc jobs outstanding before calling
> "get_completions" API?

Yes. That's what I thought. Thats where my question on what is the max number of
outstanding completions. I thought it can be up to 64K. Agree to keep
it implementation-specific and not need to highlight this in the
documentation.


>
> >
> > In your case, Is completion happens in the same ring memory(looks like
> > one bit in job desc represents the job completed or not) ?
> > And when application calls rte_dmadev_completed(), You  are converting
> > UINT16_MAX based index to
> > rte_dmadev_vchan_conf::nb_desc. Right?
>
> Yes, we are masking to do that. Actually, for simplicity and perf we should
> only allow power-of-2 ring sizes. Having to use modulus instead of masking
> could be a problem. [Alternatively, I suppose we can allow drivers to round
> up the ring sizes to the next power of 2, but I prefer just documenting it
> as a limitation].

OK.

>
> /Bruce


More information about the dev mailing list