[dpdk-dev] [PATCH] dmadev: introduce DMA device library
Jerin Jacob
jerinjacobk at gmail.com
Sun Jul 4 11:30:30 CEST 2021
On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng <fengchengwen at huawei.com> wrote:
>
> This patch introduces 'dmadevice' which is a generic type of DMA
> device.
>
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
>
> Signed-off-by: Chengwen Feng <fengchengwen at huawei.com>
Thanks for v1.
I would suggest finalizing lib/dmadev/rte_dmadev.h before doing the
implementation so that you don't need
to waste time on rewoking the implementation.
Comments inline.
> ---
> MAINTAINERS | 4 +
> config/rte_config.h | 3 +
> lib/dmadev/meson.build | 6 +
> lib/dmadev/rte_dmadev.c | 438 +++++++++++++++++++++
> lib/dmadev/rte_dmadev.h | 919 +++++++++++++++++++++++++++++++++++++++++++
> lib/dmadev/rte_dmadev_core.h | 98 +++++
> lib/dmadev/rte_dmadev_pmd.h | 210 ++++++++++
> lib/dmadev/version.map | 32 ++
Missed to update doxygen. See doc/api/doxy-api.conf.in
Use meson -Denable_docs=true to verify the generated doxgen doc.
> lib/meson.build | 1 +
> 9 files changed, 1711 insertions(+)
> create mode 100644 lib/dmadev/meson.build
> create mode 100644 lib/dmadev/rte_dmadev.c
> create mode 100644 lib/dmadev/rte_dmadev.h
> create mode 100644 lib/dmadev/rte_dmadev_core.h
> create mode 100644 lib/dmadev/rte_dmadev_pmd.h
> create mode 100644 lib/dmadev/version.map
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4347555..2019783 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
> F: app/test/test_rawdev.c
> F: doc/guides/prog_guide/rawdev.rst
>
Add EXPERIMENTAL
> +Dma device API
> +M: Chengwen Feng <fengchengwen at huawei.com>
> +F: lib/dmadev/
> +
>
> new file mode 100644
> index 0000000..a94e839
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -0,0 +1,438 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.
> + */
> +
> +#include <ctype.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <stdint.h>
> +
> +#include <rte_log.h>
> +#include <rte_debug.h>
> +#include <rte_dev.h>
> +#include <rte_memory.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_errno.h>
> +#include <rte_string_fns.h>
Sort in alphabetical order.
> +
> +#include "rte_dmadev.h"
> +#include "rte_dmadev_pmd.h"
> +
> +struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];
# Please check have you missed any multiprocess angle.
lib/regexdev/rte_regexdev.c is latest device class implemented in dpdk and
please check *rte_regexdev_shared_data scheme.
# Missing dynamic log for this library.
> diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> new file mode 100644
> index 0000000..f74fc6a
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.h
> @@ -0,0 +1,919 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.
It would be nice to add other companies' names who have contributed to
the specification.
> + */
> +
> +#ifndef _RTE_DMADEV_H_
> +#define _RTE_DMADEV_H_
> +
> +/**
> + * @file rte_dmadev.h
> + *
> + * RTE DMA (Direct Memory Access) device APIs.
> + *
> + * The generic DMA device diagram:
> + *
> + * ------------ ------------
> + * | HW-queue | | HW-queue |
> + * ------------ ------------
> + * \ /
> + * \ /
> + * \ /
> + * ----------------
> + * |dma-controller|
> + * ----------------
> + *
> + * The DMA could have multiple HW-queues, each HW-queue could have multiple
> + * capabilities, e.g. whether to support fill operation, supported DMA
> + * transfter direction and etc.
typo
> + *
> + * The DMA framework is built on the following abstraction model:
> + *
> + * ------------ ------------
> + * |virt-queue| |virt-queue|
> + * ------------ ------------
> + * \ /
> + * \ /
> + * \ /
> + * ------------ ------------
> + * | HW-queue | | HW-queue |
> + * ------------ ------------
> + * \ /
> + * \ /
> + * \ /
> + * ----------
> + * | dmadev |
> + * ----------
Continuing the discussion with @Morten Brørup , I think, we need to
finalize the model.
> + * a) The DMA operation request must be submitted to the virt queue, virt
> + * queues must be created based on HW queues, the DMA device could have
> + * multiple HW queues.
> + * b) The virt queues on the same HW-queue could represent different contexts,
> + * e.g. user could create virt-queue-0 on HW-queue-0 for mem-to-mem
> + * transfer scenario, and create virt-queue-1 on the same HW-queue for
> + * mem-to-dev transfer scenario.
> + * NOTE: user could also create multiple virt queues for mem-to-mem transfer
> + * scenario as long as the corresponding driver supports.
> + *
> + * The control plane APIs include configure/queue_setup/queue_release/start/
> + * stop/reset/close, in order to start device work, the call sequence must be
> + * as follows:
> + * - rte_dmadev_configure()
> + * - rte_dmadev_queue_setup()
> + * - rte_dmadev_start()
Please add reconfigure behaviour etc, Please check the
lib/regexdev/rte_regexdev.h
introduction. I have added similar ones so you could reuse as much as possible.
> + * The dataplane APIs include two parts:
> + * a) The first part is the submission of operation requests:
> + * - rte_dmadev_copy()
> + * - rte_dmadev_copy_sg() - scatter-gather form of copy
> + * - rte_dmadev_fill()
> + * - rte_dmadev_fill_sg() - scatter-gather form of fill
> + * - rte_dmadev_fence() - add a fence force ordering between operations
> + * - rte_dmadev_perform() - issue doorbell to hardware
> + * These APIs could work with different virt queues which have different
> + * contexts.
> + * The first four APIs are used to submit the operation request to the virt
> + * queue, if the submission is successful, a cookie (as type
> + * 'dma_cookie_t') is returned, otherwise a negative number is returned.
> + * b) The second part is to obtain the result of requests:
> + * - rte_dmadev_completed()
> + * - return the number of operation requests completed successfully.
> + * - rte_dmadev_completed_fails()
> + * - return the number of operation requests failed to complete.
> + *
> + * The misc APIs include info_get/queue_info_get/stats/xstats/selftest, provide
> + * information query and self-test capabilities.
> + *
> + * About the dataplane APIs MT-safe, there are two dimensions:
> + * a) For one virt queue, the submit/completion API could be MT-safe,
> + * e.g. one thread do submit operation, another thread do completion
> + * operation.
> + * If driver support it, then declare RTE_DMA_DEV_CAPA_MT_VQ.
> + * If driver don't support it, it's up to the application to guarantee
> + * MT-safe.
> + * b) For multiple virt queues on the same HW queue, e.g. one thread do
> + * operation on virt-queue-0, another thread do operation on virt-queue-1.
> + * If driver support it, then declare RTE_DMA_DEV_CAPA_MT_MVQ.
> + * If driver don't support it, it's up to the application to guarantee
> + * MT-safe.
>From an application PoV it may not be good to write portable
applications. Please check
latest thread with @Morten Brørup
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_common.h>
> +#include <rte_memory.h>
> +#include <rte_errno.h>
> +#include <rte_compat.h>
Sort in alphabetical order.
> +
> +/**
> + * dma_cookie_t - an opaque DMA cookie
Since we are defining the behaviour is not opaque any more.
I think, it is better to call ring_idx or so.
> +#define RTE_DMA_DEV_CAPA_MT_MVQ (1ull << 11) /**< Support MT-safe of multiple virt queues */
Please lot of @see for all symbols where it is being used. So that one
can understand the full scope of
symbols. See below example.
#define RTE_REGEXDEV_CAPA_RUNTIME_COMPILATION_F (1ULL << 0)
/**< RegEx device does support compiling the rules at runtime unlike
* loading only the pre-built rule database using
* struct rte_regexdev_config::rule_db in rte_regexdev_configure()
*
* @see struct rte_regexdev_config::rule_db, rte_regexdev_configure()
* @see struct rte_regexdev_info::regexdev_capa
*/
> + *
> + * If dma_cookie_t is >=0 it's a DMA operation request cookie, <0 it's a error
> + * code.
> + * When using cookies, comply with the following rules:
> + * a) Cookies for each virtual queue are independent.
> + * b) For a virt queue, the cookie are monotonically incremented, when it reach
> + * the INT_MAX, it wraps back to zero.
> + * c) The initial cookie of a virt queue is zero, after the device is stopped or
> + * reset, the virt queue's cookie needs to be reset to zero.
> + * Example:
> + * step-1: start one dmadev
> + * step-2: enqueue a copy operation, the cookie return is 0
> + * step-3: enqueue a copy operation again, the cookie return is 1
> + * ...
> + * step-101: stop the dmadev
> + * step-102: start the dmadev
> + * step-103: enqueue a copy operation, the cookie return is 0
> + * ...
> + */
Good explanation.
> +typedef int32_t dma_cookie_t;
> +
> +/**
> + * dma_scatterlist - can hold scatter DMA operation request
> + */
> +struct dma_scatterlist {
I prefer to change scatterlist -> sg
i.e rte_dma_sg
> + void *src;
> + void *dst;
> + uint32_t length;
> +};
> +
> +
> +/**
> + * A structure used to retrieve the contextual information of
> + * an DMA device
> + */
> +struct rte_dmadev_info {
> + /**
> + * Fields filled by framewok
typo.
> + */
> + struct rte_device *device; /**< Generic Device information */
> + const char *driver_name; /**< Device driver name */
> + int socket_id; /**< Socket ID where memory is allocated */
> +
> + /**
> + * Specification fields filled by driver
> + */
> + uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */
> + uint16_t max_hw_queues; /**< Maximum number of HW queues. */
> + uint16_t max_vqs_per_hw_queue;
> + /**< Maximum number of virt queues to allocate per HW queue */
> + uint16_t max_desc;
> + /**< Maximum allowed number of virt queue descriptors */
> + uint16_t min_desc;
> + /**< Minimum allowed number of virt queue descriptors */
Please add max_nb_segs. i.e maximum number of segments supported.
> +
> + /**
> + * Status fields filled by driver
> + */
> + uint16_t nb_hw_queues; /**< Number of HW queues configured */
> + uint16_t nb_vqs; /**< Number of virt queues configured */
> +};
> + i
> +
> +/**
> + * dma_address_type
> + */
> +enum dma_address_type {
> + DMA_ADDRESS_TYPE_IOVA, /**< Use IOVA as dma address */
> + DMA_ADDRESS_TYPE_VA, /**< Use VA as dma address */
> +};
> +
> +/**
> + * A structure used to configure a DMA device.
> + */
> +struct rte_dmadev_conf {
> + enum dma_address_type addr_type; /**< Address type to used */
I think, there are 3 kinds of limitations/capabilities.
When the system is configured as IOVA as VA
1) Device supports any VA address like memory from rte_malloc(),
rte_memzone(), malloc, stack memory
2) Device support only VA address from rte_malloc(), rte_memzone() i.e
memory backed by hugepage and added to DMA map.
When the system is configured as IOVA as PA
1) Devices support only PA addresses .
IMO, Above needs to be advertised as capability and application needs
to align with that
and I dont think application requests the driver to work in any of the modes.
> + uint16_t nb_hw_queues; /**< Number of HW-queues enable to use */
> + uint16_t max_vqs; /**< Maximum number of virt queues to use */
You need to what is max value allowed etc i.e it is based on
info_get() and mention the field
in info structure
> +
> +/**
> + * dma_transfer_direction
> + */
> +enum dma_transfer_direction {
rte_dma_transter_direction
> + DMA_MEM_TO_MEM,
> + DMA_MEM_TO_DEV,
> + DMA_DEV_TO_MEM,
> + DMA_DEV_TO_DEV,
> +};
> +
> +/**
> + * A structure used to configure a DMA virt queue.
> + */
> +struct rte_dmadev_queue_conf {
> + enum dma_transfer_direction direction;
> + /**< Associated transfer direction */
> + uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> + uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> + uint64_t dev_flags; /**< Device specific flags */
Use of this? Need more comments on this.
Since it is in slowpath, We can have non opaque names here based on
each driver capability.
> + void *dev_ctx; /**< Device specific context */
Use of this ? Need more comment ont this.
Please add some good amount of reserved bits and have API to init this
structure for future ABI stability, say rte_dmadev_queue_config_init()
or so.
> +
> +/**
> + * A structure used to retrieve information of a DMA virt queue.
> + */
> +struct rte_dmadev_queue_info {
> + enum dma_transfer_direction direction;
A queue may support all directions so I think it should be a bitfield.
> + /**< Associated transfer direction */
> + uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> + uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> + uint64_t dev_flags; /**< Device specific flags */
> +};
> +
> +__rte_experimental
> +static inline dma_cookie_t
> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id,
> + const struct dma_scatterlist *sg,
> + uint32_t sg_len, uint64_t flags)
I would like to change this as:
rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, const struct
rte_dma_sg *src, uint32_t nb_src,
const struct rte_dma_sg *dst, uint32_t nb_dst) or so allow the use case like
src 30 MB copy can be splitted as written as 1 MB x 30 dst.
> +{
> + struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> + return (*dev->copy_sg)(dev, vq_id, sg, sg_len, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a fill operation onto the DMA virt queue
> + *
> + * This queues up a fill operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + * The identifier of the device.
> + * @param vq_id
> + * The identifier of virt queue.
> + * @param pattern
> + * The pattern to populate the destination buffer with.
> + * @param dst
> + * The address of the destination buffer.
> + * @param length
> + * The length of the destination buffer.
> + * @param flags
> + * An opaque flags for this operation.
PLEASE REMOVE opaque stuff from fastpath it will be a pain for
application writers as
they need to write multiple combinations of fastpath. flags are OK, if
we have a valid
generic flag now to control the transfer behavior.
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Add a fence to force ordering between operations
> + *
> + * This adds a fence to a sequence of operations to enforce ordering, such that
> + * all operations enqueued before the fence must be completed before operations
> + * after the fence.
> + * NOTE: Since this fence may be added as a flag to the last operation enqueued,
> + * this API may not function correctly when called immediately after an
> + * "rte_dmadev_perform" call i.e. before any new operations are enqueued.
> + *
> + * @param dev_id
> + * The identifier of the device.
> + * @param vq_id
> + * The identifier of virt queue.
> + *
> + * @return
> + * - =0: Successful add fence.
> + * - <0: Failure to add fence.
> + *
> + * NOTE: The caller must ensure that the input parameter is valid and the
> + * corresponding device supports the operation.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_fence(uint16_t dev_id, uint16_t vq_id)
> +{
> + struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> + return (*dev->fence)(dev, vq_id);
> +}
Since HW submission is in a queue(FIFO) the ordering is always
maintained. Right?
Could you share more details and use case of fence() from
driver/application PoV?
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger hardware to begin performing enqueued operations
> + *
> + * This API is used to write the "doorbell" to the hardware to trigger it
> + * to begin the operations previously enqueued by rte_dmadev_copy/fill()
> + *
> + * @param dev_id
> + * The identifier of the device.
> + * @param vq_id
> + * The identifier of virt queue.
> + *
> + * @return
> + * - =0: Successful trigger hardware.
> + * - <0: Failure to trigger hardware.
> + *
> + * NOTE: The caller must ensure that the input parameter is valid and the
> + * corresponding device supports the operation.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_perform(uint16_t dev_id, uint16_t vq_id)
> +{
> + struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> + return (*dev->perform)(dev, vq_id);
> +}
Since we have additional function call overhead in all the
applications for this scheme, I would like to understand
the use of doing this way vs enq does the doorbell implicitly from
driver/application PoV?
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that have been successful completed.
> + *
> + * @param dev_id
> + * The identifier of the device.
> + * @param vq_id
> + * The identifier of virt queue.
> + * @param nb_cpls
> + * The maximum number of completed operations that can be processed.
> + * @param[out] cookie
> + * The last completed operation's cookie.
> + * @param[out] has_error
> + * Indicates if there are transfer error.
> + *
> + * @return
> + * The number of operations that successful completed.
successfully
> + *
> + * NOTE: The caller must ensure that the input parameter is valid and the
> + * corresponding device supports the operation.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed(uint16_t dev_id, uint16_t vq_id, const uint16_t nb_cpls,
> + dma_cookie_t *cookie, bool *has_error)
> +{
> + struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> + has_error = false;
> + return (*dev->completed)(dev, vq_id, nb_cpls, cookie, has_error);
It may be better to have cookie/ring_idx as third argument.
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that failed to complete.
> + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> + *
> + * @param dev_id
> + * The identifier of the device.
> + * @param vq_id
> + * The identifier of virt queue.
(> + * @param nb_status
> + * Indicates the size of status array.
> + * @param[out] status
> + * The error code of operations that failed to complete.
> + * @param[out] cookie
> + * The last failed completed operation's cookie.
> + *
> + * @return
> + * The number of operations that failed to complete.
> + *
> + * NOTE: The caller must ensure that the input parameter is valid and the
> + * corresponding device supports the operation.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id,
> + const uint16_t nb_status, uint32_t *status,
> + dma_cookie_t *cookie)
IMO, it is better to move cookie/rind_idx at 3.
Why it would return any array of errors? since it called after
rte_dmadev_completed() has
has_error. Is it better to change
rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, dma_cookie_t
*cookie, uint32_t *status)
I also think, we may need to set status as bitmask and enumerate all
the combination of error codes
of all the driver and return string from driver existing rte_flow_error
See
struct rte_flow_error {
enum rte_flow_error_type type; /**< Cause field and error types. */
const void *cause; /**< Object responsible for the error. */
const char *message; /**< Human-readable error message. */
};
> +{
> + struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> + return (*dev->completed_fails)(dev, vq_id, nb_status, status, cookie);
> +}
> +
> +struct rte_dmadev_stats {
> + uint64_t enqueue_fail_count;
> + /**< Conut of all operations which failed enqueued */
> + uint64_t enqueued_count;
> + /**< Count of all operations which successful enqueued */
> + uint64_t completed_fail_count;
> + /**< Count of all operations which failed to complete */
> + uint64_t completed_count;
> + /**< Count of all operations which successful complete */
> +};
We need to have capability API to tell which items are
updated/supported by the driver.
> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> new file mode 100644
> index 0000000..a3afea2
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_core.h
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.
> + */
> +
> +#ifndef _RTE_DMADEV_CORE_H_
> +#define _RTE_DMADEV_CORE_H_
> +
> +/**
> + * @file
> + *
> + * RTE DMA Device internal header.
> + *
> + * This header contains internal data types. But they are still part of the
> + * public API because they are used by inline public functions.
> + */
> +
> +struct rte_dmadev;
> +
> +typedef dma_cookie_t (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vq_id,
> + void *src, void *dst,
> + uint32_t length, uint64_t flags);
> +/**< @internal Function used to enqueue a copy operation. */
To avoid namespace conflict(as it is public API) use rte_
> +
> +/**
> + * The data structure associated with each DMA device.
> + */
> +struct rte_dmadev {
> + /**< Enqueue a copy operation onto the DMA device. */
> + dmadev_copy_t copy;
> + /**< Enqueue a scatter list copy operation onto the DMA device. */
> + dmadev_copy_sg_t copy_sg;
> + /**< Enqueue a fill operation onto the DMA device. */
> + dmadev_fill_t fill;
> + /**< Enqueue a scatter list fill operation onto the DMA device. */
> + dmadev_fill_sg_t fill_sg;
> + /**< Add a fence to force ordering between operations. */
> + dmadev_fence_t fence;
> + /**< Trigger hardware to begin performing enqueued operations. */
> + dmadev_perform_t perform;
> + /**< Returns the number of operations that successful completed. */
> + dmadev_completed_t completed;
> + /**< Returns the number of operations that failed to complete. */
> + dmadev_completed_fails_t completed_fails;
We need to limit fastpath items in 1 CL
> +
> + void *dev_private; /**< PMD-specific private data */
> + const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD */
> +
> + uint16_t dev_id; /**< Device ID for this instance */
> + int socket_id; /**< Socket ID where memory is allocated */
> + struct rte_device *device;
> + /**< Device info. supplied during device initialization */
> + const char *driver_name; /**< Driver info. supplied by probing */
> + char name[RTE_DMADEV_NAME_MAX_LEN]; /**< Device name */
> +
> + RTE_STD_C11
> + uint8_t attached : 1; /**< Flag indicating the device is attached */
> + uint8_t started : 1; /**< Device state: STARTED(1)/STOPPED(0) */
Add a couple of reserved fields for future ABI stability.
> +
> +} __rte_cache_aligned;
> +
> +extern struct rte_dmadev rte_dmadevices[];
> +
More information about the dev
mailing list