[dpdk-dev] [PATCH] dmadev: introduce DMA device library

Bruce Richardson bruce.richardson at intel.com
Tue Jul 6 11:27:57 CEST 2021


On Tue, Jul 06, 2021 at 04:20:38PM +0800, fengchengwen wrote:
> On 2021/7/5 18:52, Bruce Richardson wrote:
> > On Sun, Jul 04, 2021 at 03:00:30PM +0530, Jerin Jacob wrote:
> >> On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng <fengchengwen at huawei.com> wrote:
> 
> [snip]
> 
> >>> + *
> >>> + * If dma_cookie_t is >=0 it's a DMA operation request cookie, <0 it's a error
> >>> + * code.
> >>> + * When using cookies, comply with the following rules:
> >>> + * a) Cookies for each virtual queue are independent.
> >>> + * b) For a virt queue, the cookie are monotonically incremented, when it reach
> >>> + *    the INT_MAX, it wraps back to zero.
> > 
> > I disagree with the INT_MAX (or INT32_MAX) value here. If we use that
> > value, it means that we cannot use implicit wrap-around inside the CPU and
> > have to check for the INT_MAX value. Better to:
> > 1. Specify that it wraps at UINT16_MAX which allows us to just use a
> > uint16_t internally and wrap-around automatically, or:
> > 2. Specify that it wraps at a power-of-2 value >= UINT16_MAX, giving
> > drivers the flexibility at what value to wrap around.
> 
> +1 for option 1
> BTW: option 2 seem a little complicated for driver and application.
> 

I would tend to agree. I just included it in case there was a case where
you explicitly wanted more than UINT16_MAX values in your driver.

> >> When the system is configured as IOVA as VA
> >> 1) Device supports any VA address like memory from rte_malloc(),
> >> rte_memzone(), malloc, stack memory
> >> 2) Device support only VA address from rte_malloc(), rte_memzone() i.e
> >> memory backed by hugepage and added to DMA map.
> >>
> >> When the system is configured as IOVA as PA
> >> 1) Devices support only PA addresses .
> >>
> >> IMO, Above needs to be  advertised as capability and application needs
> >> to align with that
> >> and I dont think application requests the driver to work in any of the modes.
> >>
> >>
> > 
> > I don't think we need this level of detail for addressing capabilities.
> > Unless I'm missing something, the hardware should behave exactly as other
> > hardware does taking in iova's.  If the user wants to check whether virtual
> > addresses to pinned memory can be used directly, the user can call
> > "rte_eal_iova_mode". We can't have a situation where some hardware uses one
> > type of addresses and another hardware the other.
> > 
> > Therefore, the only additional addressing capability we should need to
> > report is that the hardware can use SVM/SVA and use virtual addresses not
> > in hugepage memory.
> > 
> 
> I discuss the addressing capability in previous thread.
> Indeed, we can reduce it to just one capability.
> 
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice.
> >>> + *
> >>> + * Enqueue a fill operation onto the DMA virt queue
> >>> + *
> >>> + * This queues up a fill operation to be performed by hardware, but does not
> >>> + * trigger hardware to begin that operation.
> >>> + *
> >>> + * @param dev_id
> >>> + *   The identifier of the device.
> >>> + * @param vq_id
> >>> + *   The identifier of virt queue.
> >>> + * @param pattern
> >>> + *   The pattern to populate the destination buffer with.
> >>> + * @param dst
> >>> + *   The address of the destination buffer.
> >>> + * @param length
> >>> + *   The length of the destination buffer.
> >>> + * @param flags
> >>> + *   An opaque flags for this operation.
> >>
> >> PLEASE REMOVE opaque stuff from fastpath it will be a pain for
> >> application writers as
> >> they need to write multiple combinations of fastpath. flags are OK, if
> >> we have a valid
> >> generic flag now to control the transfer behavior.
> >>
> > 
> > +1. Flags need to be explicitly listed. If we don't have any flags for now,
> > we can specify that the value must be given as zero and it's for future
> > use.
> > 
> 
> +1, I will delete the flags parameters.
> 
> Currently we have fence which was implemented by ops, if later need more flags,
> maybe we need create one new ops, this is not the way to expand.
> 
> So I think we need change fence ops to extra_flags ops:
> 	rte_dmadev_fence(uint16_t dev_id, uint16_t vq_id)
> to
> 	rte_dmadev_extra_flags(uint16_t dev_id, uint16_t vq_id, uint64_t flags);
> 
> So we could add fence by: rte_dmadev_extra_flags(dev_id, vq_id, RTE_DMA_FLAG_FENCE);
> 

I don't think this is the way to go. I think we will need the flags
parameter per op in the future, so we should keep it, even if it is always
zero for now. It gives us future expandability options.

> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice.
> >>> + *
> >>> + * Returns the number of operations that failed to complete.
> >>> + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> >>> + *
> >>> + * @param dev_id
> >>> + *   The identifier of the device.
> >>> + * @param vq_id
> >>> + *   The identifier of virt queue.
> >> (> + * @param nb_status
> >>> + *   Indicates the size  of status array.
> >>> + * @param[out] status
> >>> + *   The error code of operations that failed to complete.
> >>> + * @param[out] cookie
> >>> + *   The last failed completed operation's cookie.
> >>> + *
> >>> + * @return
> >>> + *   The number of operations that failed to complete.
> >>> + *
> >>> + * NOTE: The caller must ensure that the input parameter is valid and the
> >>> + *       corresponding device supports the operation.
> >>> + */
> >>> +__rte_experimental
> >>> +static inline uint16_t
> >>> +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id,
> >>> +                          const uint16_t nb_status, uint32_t *status,
> >>> +                          dma_cookie_t *cookie)
> >>
> >> IMO, it is better to move cookie/rind_idx at 3.
> >> Why it would return any array of errors? since it called after
> >> rte_dmadev_completed() has
> >> has_error. Is it better to change
> >>
> >> rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, dma_cookie_t
> >> *cookie,  uint32_t *status)
> >>
> >> I also think, we may need to set status as bitmask and enumerate all
> >> the combination of error codes
> >> of all the driver and return string from driver existing rte_flow_error
> >>
> >> See
> >> struct rte_flow_error {
> >>         enum rte_flow_error_type type; /**< Cause field and error types. */
> >>         const void *cause; /**< Object responsible for the error. */
> >>         const char *message; /**< Human-readable error message. */
> >> };
> >>
> > 
> > I think we need a multi-return value API here, as we may add operations in
> > future which have non-error status values to return. The obvious case is
> > DMA engines which support "compare" operations. In that case a successful
> 
> Just curious, what the 'compare' operations's application scenario ?
> 

We are not looking to use this capability just now - but it's a capability
in our hardware that offers some interest possibilities so I'd like to
ensure it's possible to integrate in future. To do so, we just need to
ensure that the function which returns the "error" status - or status
generally, can be used to returns bursts of statuses, even if it's slower
compared to the regular completion return which just assumes all succeed.

> > compare (as in there were no DMA or HW errors) can return "equal" or
> > "not-equal" as statuses. For general "copy" operations, the faster
> > completion op can be used to just return successful values (and only call
> > this status version on error), while apps using those compare ops or a
> > mixture of copy and compare ops, would always use the slower one that
> > returns status values for each and every op..
> 
> In the current design, rte_dmadev_completed_fails applies only to failure
> scenarios. Do you mean in 'compare' operations, the status always non-zero
> whether or not the two are consistent ?
> 

Yes and no. There are two separate "status" values to be returned for such
operations - the actual HW status i.e. all parameters valid, and the actual
memcmp result of equal/non-equal. In our completion records these are
called "status" and "result" respectively. "Result" is only valid if
"status" is successful, and is not relevant for copy or fill or similar
ops. Therefore, to support this, we just need some bits in "status"
reserved for that result case, and the completed_status op to return an
array of status values, not just a single one.

> > 
> > The ioat APIs used 32-bit integer values for this status array so as to
> > allow e.g. 16-bits for error code and 16-bits for future status values. For
> > most operations there should be a fairly small set of things that can go
> > wrong, i.e. bad source address, bad destination address or invalid length.
> > Within that we may have a couple of specifics for why an address is bad,
> > but even so I don't think we need to start having multiple bit
> > combinations.
> > 
> >>> +{
> >>> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> >>> +       return (*dev->completed_fails)(dev, vq_id, nb_status, status, cookie);
> >>> +}
> >>> +
> >>> +struct rte_dmadev_stats {
> >>> +       uint64_t enqueue_fail_count;
> >>> +       /**< Conut of all operations which failed enqueued */
> >>> +       uint64_t enqueued_count;
> >>> +       /**< Count of all operations which successful enqueued */
> >>> +       uint64_t completed_fail_count;
> >>> +       /**< Count of all operations which failed to complete */
> >>> +       uint64_t completed_count;
> >>> +       /**< Count of all operations which successful complete */
> >>> +};
> >>
> >> We need to have capability API to tell which items are
> >> updated/supported by the driver.
> >>
> > 
> > I also would remove the enqueue fail counts, since they are better counted
> > by the app. If a driver reports 20,000 failures we have no way of knowing
> > if that is 20,000 unique operations which failed to enqueue or a single
> > operation which failed to enqueue 20,000 times but succeeded on attempt
> > 20,001.
> > 
> 
> This does exist, The application may just show a DEBUG trace other than recording.
> So I would recommend keeping at least know if it happens after a long run.
> 
I disagree here - the enqueue failure should only be tracked by the app,
because:
1. only app knows whether a particular enqueue failure is retry or not and
   how it should be counted
2. these failures cannot be counted by hardware and must be counted by
   software, so adding additional operations to our enqueue path. In the
   retry case, that could be a lot of load-update-stores that will have to
   be done in the driver, while if tracked in the app, the count would
   just be a register increment.

Operation failures can be tracked in driver stats, though, as that is
related to hardware operation.

/Bruce


More information about the dev mailing list