[PATCH v3 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD

fengchengwen fengchengwen at huawei.com
Sat Jun 27 02:01:41 CEST 2026


On 6/26/2026 2:47 AM, Raghavendra Ningoji wrote:
> Add the skeleton of a new dmadev poll-mode driver for the AMD AE4DMA
> hardware DMA engine, providing only PCI probe/remove and per-queue
> hardware initialisation. An AE4DMA engine exposes 16 hardware command
> queues, each with a 32-entry descriptor ring; the PMD maps each
> hardware channel to its own dmadev with a single virtual channel,
> so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
> "<pci-bdf>-ch15".
> 
> This patch only registers the PCI driver, allocates the dmadev
> objects, reserves the per-queue descriptor rings and programs the
> hardware queue base addresses. Control and data path operations are
> added in subsequent patches.
> 
> Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji at amd.com>
> ---
>  .mailmap                               |   1 +
>  MAINTAINERS                            |   5 +
>  doc/guides/dmadevs/ae4dma.rst          |  53 ++++++
>  doc/guides/dmadevs/index.rst           |   1 +
>  doc/guides/rel_notes/release_26_07.rst |   7 +
>  drivers/dma/ae4dma/ae4dma_dmadev.c     | 220 +++++++++++++++++++++++++
>  drivers/dma/ae4dma/ae4dma_hw_defs.h    | 154 +++++++++++++++++
>  drivers/dma/ae4dma/ae4dma_internal.h   |  97 +++++++++++
>  drivers/dma/ae4dma/meson.build         |   7 +
>  drivers/dma/meson.build                |   1 +
>  usertools/dpdk-devbind.py              |   5 +-
>  11 files changed, 550 insertions(+), 1 deletion(-)
>  create mode 100644 doc/guides/dmadevs/ae4dma.rst
>  create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
>  create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
>  create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
>  create mode 100644 drivers/dma/ae4dma/meson.build
> 
> diff --git a/.mailmap b/.mailmap
> index 89ba6ffccc..71a62564fa 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -1329,6 +1329,7 @@ Radu Bulie <radu-andrei.bulie at nxp.com>
>  Radu Nicolau <radu.nicolau at intel.com>
>  Rafael Ávila de Espíndola <espindola at scylladb.com>
>  Rafal Kozik <rk at semihalf.com>
> +Raghavendra Ningoji <raghavendra.ningoji at amd.com>
>  Ragothaman Jayaraman <rjayaraman at caviumnetworks.com>
>  Rahul Bhansali <rbhansali at marvell.com>
>  Rahul Gupta <rahul.gupta at broadcom.com>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9143d028bc..2e27af49f4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1361,6 +1361,11 @@ F: doc/guides/compressdevs/features/zsda.ini
>  DMAdev Drivers
>  --------------
>  
> +AMD AE4DMA
> +M: Bhagyada Modali <bhagyada.modali at amd.com>
> +F: drivers/dma/ae4dma/
> +F: doc/guides/dmadevs/ae4dma.rst
> +
>  Intel IDXD - EXPERIMENTAL
>  M: Bruce Richardson <bruce.richardson at intel.com>
>  M: Kevin Laatz <kevin.laatz at intel.com>
> diff --git a/doc/guides/dmadevs/ae4dma.rst b/doc/guides/dmadevs/ae4dma.rst
> new file mode 100644
> index 0000000000..a85c1d92ca
> --- /dev/null
> +++ b/doc/guides/dmadevs/ae4dma.rst
> @@ -0,0 +1,53 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2025 Advanced Micro Devices, Inc.

2025 -> 2026?

> +
> +.. include:: <isonum.txt>
> +
> +AMD AE4DMA DMA Device Driver
> +============================
> +
> +The ``ae4dma`` dmadev driver is a poll-mode driver (PMD) for the
> +AMD AE4DMA hardware DMA engine. The engine exposes 16 independent
> +hardware command queues, each with a ring of 32 descriptors. The PMD
> +maps each hardware command queue to a separate DPDK dmadev with a
> +single virtual channel, so a single PCI function appears as 16 dmadevs
> +named ``<pci-bdf>-ch0`` through ``<pci-bdf>-ch15``.
> +
> +The driver supports memory-to-memory copy operations only.
> +
> +Hardware Requirements
> +---------------------
> +
> +The ``dpdk-devbind.py`` script can be used to list AE4DMA devices on
> +the system::
> +
> +   dpdk-devbind.py --status-dev dma
> +
> +AE4DMA devices appear with vendor ID ``0x1022`` and device ID
> +``0x149b``.
> +
> +Compilation
> +-----------
> +
> +The driver is built as part of the standard DPDK build on x86 platforms
> +using ``meson`` and ``ninja``; no extra configuration is required.
> +
> +Device Setup
> +------------
> +
> +The AE4DMA device must be bound to a DPDK-compatible kernel module such
> +as ``vfio-pci`` before it can be used::
> +
> +   dpdk-devbind.py -b vfio-pci <pci-bdf>
> +
> +Initialization
> +~~~~~~~~~~~~~~
> +
> +On probe the PMD performs the following steps for each PCI function:
> +
> +* Reads BAR0 and programs the common configuration register with the
> +  number of hardware queues to enable (16).
> +* For each hardware queue it allocates a 32-entry descriptor ring in
> +  IOVA-contiguous memory, programs the queue base address and ring
> +  depth into the per-queue registers, and enables the queue.
> +* Interrupts are masked; completion is polled by the application.
> diff --git a/doc/guides/dmadevs/index.rst b/doc/guides/dmadevs/index.rst
> index 56beb1733f..97399590f6 100644
> --- a/doc/guides/dmadevs/index.rst
> +++ b/doc/guides/dmadevs/index.rst
> @@ -11,6 +11,7 @@ an application through DMA API.
>     :maxdepth: 1
>     :numbered:
>  
> +   ae4dma
>     cnxk
>     dpaa
>     dpaa2
> diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
> index f012d47a4b..9a78a7ef62 100644
> --- a/doc/guides/rel_notes/release_26_07.rst
> +++ b/doc/guides/rel_notes/release_26_07.rst
> @@ -63,6 +63,13 @@ New Features
>      ``rte_eal_init`` and the application is responsible for probing each device,
>    * ``--auto-probing`` enables the initial bus probing, which is the current default behavior.
>  
> +* **Added AMD AE4DMA DMA PMD.**
> +
> +  Added a new ``dma/ae4dma`` driver for the AMD AE4DMA hardware DMA engine.
> +  Each PCI function exposes 16 hardware command queues; the PMD registers one
> +  dmadev per channel with a single virtual channel and supports
> +  memory-to-memory copy operations.
> +
>  
>  Removed Items
>  -------------
> diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
> new file mode 100644
> index 0000000000..3d82f86906
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
> @@ -0,0 +1,220 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#include <errno.h>
> +#include <inttypes.h>
> +#include <stdio.h>
> +#include <string.h>
> +
> +#include <rte_bus_pci.h>
> +#include <bus_pci_driver.h>
> +#include <rte_dmadev_pmd.h>
> +#include <rte_malloc.h>
> +
> +#include "ae4dma_internal.h"
> +
> +/*
> + * One dmadev per AE4DMA hardware channel; each dmadev has exactly one
> + * virtual channel. The HW's per-queue register block must be densely
> + * packed right after the engine-common config register at BAR0+0; the
> + * build-time check below catches an accidental layout change.
> + */
> +static_assert(sizeof(struct ae4dma_hwq_regs) == 32,
> +		"ae4dma_hwq_regs stride changed; per-queue offset math will break");
> +
> +RTE_LOG_REGISTER_DEFAULT(ae4dma_pmd_logtype, INFO);
> +
> +#define AE4DMA_PMD_NAME dmadev_ae4dma
> +
> +static const struct rte_memzone *
> +ae4dma_queue_dma_zone_reserve(const char *queue_name,
> +		uint32_t queue_size, int socket_id)
> +{
> +	const struct rte_memzone *mz;
> +
> +	mz = rte_memzone_lookup(queue_name);
> +	if (mz != NULL) {
> +		if (((size_t)queue_size <= mz->len) &&
> +				((socket_id == SOCKET_ID_ANY) ||
> +				 (socket_id == mz->socket_id))) {
> +			AE4DMA_PMD_INFO("reuse memzone already "
> +					"allocated for %s", queue_name);
> +			return mz;
> +		}
> +		AE4DMA_PMD_ERR("Incompatible memzone already "
> +				"allocated %s, size %u, socket %d. "
> +				"Requested size %u, socket %u",
> +				queue_name, (uint32_t)mz->len,
> +				mz->socket_id, queue_size, socket_id);
> +		return NULL;
> +	}
> +	return rte_memzone_reserve_aligned(queue_name, queue_size,
> +			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);

No need to do such reuse, and this resource could setup in vchan_setup ops,
but your dmadev has max 32 descriptors and only 1 vchan per-dmadev, so I think
it's ok to setup in the probe.

> +}
> +
> +static int
> +ae4dma_add_queue(struct ae4dma_dmadev *dev, struct rte_pci_device *pci,
> +		uint8_t qn, const char *pci_name)
> +{
> +	uint32_t dma_addr_lo, dma_addr_hi;
> +	struct ae4dma_cmd_queue *cmd_q;
> +	const struct rte_memzone *q_mz;
> +
> +	dev->io_regs = pci->mem_resource[AE4DMA_PCIE_BAR].addr;
> +
> +	cmd_q = &dev->cmd_q;
> +	cmd_q->id = qn;
> +	cmd_q->qidx = 0;
> +	cmd_q->qsize = AE4DMA_QUEUE_SIZE(AE4DMA_QUEUE_DESC_SIZE);
> +	cmd_q->hwq_regs = (volatile struct ae4dma_hwq_regs *)dev->io_regs + (qn + 1);
> +
> +	/*
> +	 * Memzone name must be globally unique. Embed PCI BDF so multiple
> +	 * PCI functions probed concurrently don't collide.
> +	 */
> +	snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
> +			"ae4dma_%s_q%u", pci_name, (unsigned int)qn);
> +
> +	q_mz = ae4dma_queue_dma_zone_reserve(cmd_q->memz_name,
> +			cmd_q->qsize, rte_socket_id());
> +	if (q_mz == NULL) {
> +		AE4DMA_PMD_ERR("memzone reserve failed for %s", cmd_q->memz_name);
> +		return -ENOMEM;
> +	}
> +
> +	cmd_q->mz = q_mz;
> +	cmd_q->qbase_addr = q_mz->addr;
> +	cmd_q->qbase_desc = q_mz->addr;
> +	cmd_q->qbase_phys_addr = q_mz->iova;
> +
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, AE4DMA_DESCRIPTORS_PER_CMDQ);
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +			AE4DMA_CMD_QUEUE_ENABLE);
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->intr_status_reg.intr_status_raw,
> +			AE4DMA_DISABLE_INTR);
> +	cmd_q->next_write = AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
> +	cmd_q->next_read = AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
> +	cmd_q->ring_buff_count = 0;
> +
> +	dma_addr_lo = lower_32_bits(cmd_q->qbase_phys_addr);
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_lo, dma_addr_lo);
> +	dma_addr_hi = upper_32_bits(cmd_q->qbase_phys_addr);
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_hi, dma_addr_hi);
> +
> +	return 0;
> +}
> +
> +static void
> +ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
> +		unsigned int ch)
> +{
> +	snprintf(out, outlen, "%s-ch%u", pci_name, ch);
> +}
> +
> +static int
> +ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
> +{
> +	struct rte_dma_dev *dmadev;
> +	struct ae4dma_dmadev *ae4dma;
> +	char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];

Please define local variables in a descending order, with longer ones
placed at the front. It is recommended to modify the entire driver in
this way.

> +
> +	memset(hwq_dev_name, 0, sizeof(hwq_dev_name));

why not char hwq_dev_name[RTE_DEV_NAME_MAX_LEN] = {0};

> +	ae4dma_channel_dev_name(hwq_dev_name, sizeof(hwq_dev_name), name, qn);
> +
> +	dmadev = rte_dma_pmd_allocate(hwq_dev_name, dev->device.numa_node,
> +			sizeof(struct ae4dma_dmadev));
> +	if (dmadev == NULL) {
> +		AE4DMA_PMD_ERR("Unable to allocate dma device");
> +		return -ENOMEM;
> +	}
> +	dmadev->device = &dev->device;
> +	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
> +
> +	ae4dma = dmadev->data->dev_private;
> +
> +	if (ae4dma_add_queue(ae4dma, dev, qn, name) != 0)
> +		goto init_error;
> +	return 0;
> +
> +init_error:
> +	AE4DMA_PMD_ERR("failed");

why not add more info, e.g. Probe failed!

> +	rte_dma_pmd_release(hwq_dev_name);
> +	return -ENOMEM;
> +}
> +
> +static int
> +ae4dma_dmadev_probe(struct rte_pci_driver *drv __rte_unused,
> +		struct rte_pci_device *dev)
> +{
> +	char name[32];
> +	char chname[RTE_DEV_NAME_MAX_LEN];
> +	void *mmio_base;
> +	uint32_t q_per_eng;
> +	int ret = 0;
> +	uint8_t i;
> +
> +	rte_pci_device_name(&dev->addr, name, sizeof(name));
> +	AE4DMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
> +
> +	mmio_base = dev->mem_resource[AE4DMA_PCIE_BAR].addr;
> +	if (mmio_base == NULL) {
> +		AE4DMA_PMD_ERR("%s: BAR%d not mapped", name, AE4DMA_PCIE_BAR);
> +		return -ENODEV;
> +	}
> +
> +	/* Program the per-engine HW queue count once. */
> +	AE4DMA_WRITE_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET,
> +			AE4DMA_MAX_HW_QUEUES);
> +	q_per_eng = AE4DMA_READ_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET);
> +	AE4DMA_PMD_INFO("%s: AE4DMA queues per engine = %u", name, q_per_eng);
> +
> +	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
> +		ret = ae4dma_dmadev_create(name, dev, i);
> +		if (ret != 0) {
> +			AE4DMA_PMD_ERR("%s create dmadev %u failed!", name, i);
> +			while (i > 0) {
> +				i--;
> +				ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
> +				rte_dma_pmd_release(chname);
> +			}
> +			break;
> +		}
> +	}
> +	return ret;
> +}
> +
> +static int
> +ae4dma_dmadev_remove(struct rte_pci_device *dev)
> +{
> +	char name[32];
> +	char chname[RTE_DEV_NAME_MAX_LEN];
> +	unsigned int i;
> +
> +	rte_pci_device_name(&dev->addr, name, sizeof(name));
> +
> +	AE4DMA_PMD_INFO("Closing %s on NUMA node %d",
> +			name, dev->device.numa_node);
> +
> +	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
> +		ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
> +		rte_dma_pmd_release(chname);
> +	}
> +	return 0;
> +}
> +
> +static const struct rte_pci_id pci_id_ae4dma_map[] = {
> +	{ RTE_PCI_DEVICE(AMD_VENDOR_ID, AE4DMA_DEVICE_ID) },
> +	{ .vendor_id = 0, /* sentinel */ },
> +};
> +
> +static struct rte_pci_driver ae4dma_pmd_drv = {
> +	.id_table = pci_id_ae4dma_map,
> +	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
> +	.probe = ae4dma_dmadev_probe,
> +	.remove = ae4dma_dmadev_remove,
> +};
> +
> +RTE_PMD_REGISTER_PCI(AE4DMA_PMD_NAME, ae4dma_pmd_drv);
> +RTE_PMD_REGISTER_PCI_TABLE(AE4DMA_PMD_NAME, pci_id_ae4dma_map);
> +RTE_PMD_REGISTER_KMOD_DEP(AE4DMA_PMD_NAME, "* igb_uio | uio_pci_generic | vfio-pci");
> diff --git a/drivers/dma/ae4dma/ae4dma_hw_defs.h b/drivers/dma/ae4dma/ae4dma_hw_defs.h
> new file mode 100644
> index 0000000000..e7798be09b
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_hw_defs.h
> @@ -0,0 +1,154 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#ifndef __AE4DMA_HW_DEFS_H__
> +#define __AE4DMA_HW_DEFS_H__
> +
> +#include <stdint.h>
> +
> +#include <rte_bus_pci.h>
> +#include <rte_byteorder.h>
> +#include <rte_io.h>
> +#include <rte_pci.h>
> +#include <rte_memzone.h>

Some of the include file are not need for this head-file.

> +
> +#define AE4DMA_BIT(nr)			(1UL << (nr))
> +
> +/* ae4dma device details */
> +#define AMD_VENDOR_ID	0x1022
> +#define AE4DMA_DEVICE_ID	0x149b
> +#define AE4DMA_PCIE_BAR 0
> +
> +/*
> + * An AE4DMA engine has 16 DMA queues. Each queue supports 32 descriptors.
> + */
> +#define AE4DMA_MAX_HW_QUEUES        16
> +#define AE4DMA_QUEUE_START_INDEX    0
> +#define AE4DMA_CMD_QUEUE_ENABLE		0x1
> +#define AE4DMA_CMD_QUEUE_DISABLE	0x0
> +
> +/* Common to all queues */
> +#define AE4DMA_COMMON_CONFIG_OFFSET 0x00
> +
> +#define AE4DMA_DISABLE_INTR 0x01
> +
> +/* Descriptor status */
> +enum ae4dma_dma_status {
> +	AE4DMA_DMA_DESC_SUBMITTED = 0,
> +	AE4DMA_DMA_DESC_VALIDATED = 1,
> +	AE4DMA_DMA_DESC_PROCESSED = 2,
> +	AE4DMA_DMA_DESC_COMPLETED = 3,
> +	AE4DMA_DMA_DESC_ERROR = 4,
> +};
> +
> +/* Descriptor error-code */
> +enum ae4dma_dma_err {
> +	AE4DMA_DMA_ERR_NO_ERR = 0,
> +	AE4DMA_DMA_ERR_INV_HEADER = 1,
> +	AE4DMA_DMA_ERR_INV_STATUS = 2,
> +	AE4DMA_DMA_ERR_INV_LEN = 3,
> +	AE4DMA_DMA_ERR_INV_SRC = 4,
> +	AE4DMA_DMA_ERR_INV_DST = 5,
> +	AE4DMA_DMA_ERR_INV_ALIGN = 6,
> +	AE4DMA_DMA_ERR_UNKNOWN = 7,
> +};
> +
> +/* HW Queue status */
> +enum ae4dma_hwqueue_status {
> +	AE4DMA_HWQUEUE_EMPTY = 0,
> +	AE4DMA_HWQUEUE_FULL = 1,
> +	AE4DMA_HWQUEUE_NOT_EMPTY = 4,
> +};
> +/*
> + * descriptor for AE4DMA commands
> + * 8 32-bit words:
> + * word 0: source memory type; destination memory type ; control bits
> + * word 1: desc_id; error code; status
> + * word 2: length
> + * word 3: reserved
> + * word 4: upper 32 bits of source pointer
> + * word 5: low 32 bits of source pointer
> + * word 6: upper 32 bits of destination pointer
> + * word 7: low 32 bits of destination pointer
> + */
> +
> +/* AE4DMA Descriptor - DWORD0 - Controls bits: Reserved for future use */
> +#define AE4DMA_DWORD0_STOP_ON_COMPLETION	AE4DMA_BIT(0)
> +#define AE4DMA_DWORD0_INTERRUPT_ON_COMPLETION	AE4DMA_BIT(1)
> +#define AE4DMA_DWORD0_START_OF_MESSAGE		AE4DMA_BIT(3)
> +#define AE4DMA_DWORD0_END_OF_MESSAGE		AE4DMA_BIT(4)
> +#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE	RTE_GENMASK64(5, 4)
> +#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE	RTE_GENMASK64(7, 6)
> +
> +#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_MEMORY    (0x0)
> +#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_IOMEMORY  (1<<4)
> +#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_MEMORY    (0x0)
> +#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_IOMEMORY  (1<<6)
> +
> +struct ae4dma_desc_dword0 {
> +	uint8_t byte0;
> +	uint8_t byte1;
> +	uint16_t timestamp;
> +};
> +
> +struct ae4dma_desc_dword1 {
> +	uint8_t status;
> +	uint8_t err_code;
> +	uint16_t desc_id;
> +};
> +
> +struct ae4dma_desc {
> +	struct ae4dma_desc_dword0 dw0;
> +	struct ae4dma_desc_dword1 dw1;
> +	uint32_t length;
> +	uint32_t reserved;
> +	uint32_t src_lo;
> +	uint32_t src_hi;
> +	uint32_t dst_lo;
> +	uint32_t dst_hi;
> +};
> +
> +/*
> + * Registers for each queue :4 bytes length
> + * Effective address : offset + reg
> + */
> +struct ae4dma_hwq_regs {
> +	union {
> +		uint32_t control_raw;
> +		struct {
> +			uint32_t queue_enable: 1;
> +			uint32_t reserved_internal: 31;
> +		} control;
> +	} control_reg;
> +
> +	union {
> +		uint32_t status_raw;
> +		struct {
> +			uint32_t reserved0: 1;
> +			/* 0–empty, 1–full, 2–stopped, 3–error , 4–Not Empty */
> +			uint32_t queue_status: 2;
> +			uint32_t reserved1: 21;
> +			uint32_t interrupt_type: 4;
> +			uint32_t reserved2: 4;
> +		} status;
> +	} status_reg;
> +
> +	uint32_t max_idx;
> +	uint32_t read_idx;
> +	uint32_t write_idx;
> +
> +	union {
> +		uint32_t intr_status_raw;
> +		struct {
> +			uint32_t intr_status: 1;
> +			uint32_t reserved: 31;
> +		} intr_status;
> +	} intr_status_reg;
> +
> +	uint32_t qbase_lo;
> +	uint32_t qbase_hi;
> +
> +};
> +
> +#endif /* AE4DMA_HW_DEFS_H */
> diff --git a/drivers/dma/ae4dma/ae4dma_internal.h b/drivers/dma/ae4dma/ae4dma_internal.h
> new file mode 100644
> index 0000000000..7f149c97b5
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_internal.h
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#ifndef _AE4DMA_INTERNAL_H_
> +#define _AE4DMA_INTERNAL_H_
> +
> +#include <stdint.h>
> +
> +#include "ae4dma_hw_defs.h"
> +
> +/* Return bits 32-63 of a 64-bit number. */
> +#define upper_32_bits(n) ((uint32_t)(((n) >> 16) >> 16))
> +
> +/* Return bits 0-31 of a 64-bit number. */
> +#define lower_32_bits(n) ((uint32_t)((n) & 0xffffffff))
> +
> +/* Hardware ring depth (slots per queue); must be power of two. */
> +#define AE4DMA_DESCRIPTORS_PER_CMDQ	32
> +#define AE4DMA_QUEUE_DESC_SIZE		sizeof(struct ae4dma_desc)
> +#define AE4DMA_QUEUE_SIZE(n)		(AE4DMA_DESCRIPTORS_PER_CMDQ * (n))
> +

two blank lines

> +
> +/* AE4DMA registers Write/Read */
> +static inline void ae4dma_pci_reg_write(void *base, int offset,
> +		uint32_t value)
> +{
> +	volatile void *reg_addr = ((uint8_t *)base + offset);
> +
> +	rte_write32((rte_cpu_to_le_32(value)), reg_addr);
> +}
> +
> +static inline uint32_t ae4dma_pci_reg_read(void *base, int offset)
> +{
> +	volatile void *reg_addr = ((uint8_t *)base + offset);
> +
> +	return rte_le_to_cpu_32(rte_read32(reg_addr));
> +}
> +
> +#define AE4DMA_READ_REG_OFFSET(hw_addr, reg_offset) \
> +	ae4dma_pci_reg_read(hw_addr, reg_offset)
> +
> +#define AE4DMA_WRITE_REG_OFFSET(hw_addr, reg_offset, value) \
> +	ae4dma_pci_reg_write(hw_addr, reg_offset, value)
> +
> +

two blank lines

> +#define AE4DMA_READ_REG(hw_addr) \
> +	ae4dma_pci_reg_read((void *)(uintptr_t)(hw_addr), 0)
> +
> +#define AE4DMA_WRITE_REG(hw_addr, value) \
> +	ae4dma_pci_reg_write((void *)(uintptr_t)(hw_addr), 0, value)
> +
> +/* A structure describing an AE4DMA command queue. */
> +struct __rte_cache_aligned ae4dma_cmd_queue {
> +	char memz_name[RTE_MEMZONE_NAMESIZE];
> +	const struct rte_memzone *mz;
> +	volatile struct ae4dma_hwq_regs *hwq_regs;
> +
> +	struct rte_dma_vchan_conf qcfg;
> +	struct rte_dma_stats stats;
> +	/* Queue address */
> +	struct ae4dma_desc *qbase_desc;
> +	void *qbase_addr;
> +	rte_iova_t qbase_phys_addr;
> +	enum ae4dma_dma_err status[AE4DMA_DESCRIPTORS_PER_CMDQ];
> +	/* Queue identifier */
> +	uint64_t id;    /* queue id */
> +	uint64_t qidx;  /* queue index */
> +	uint64_t qsize; /* queue size */
> +	uint32_t ring_buff_count;
> +	uint16_t next_read;
> +	uint16_t next_write;
> +	uint16_t last_write; /* Used to compute submitted count. */
> +};
> +
> +/*
> + * One dmadev per AE4DMA hardware channel: probe creates AE4DMA_MAX_HW_QUEUES
> + * dmadevs per PCI function, each owning a single HW command queue.
> + */
> +struct ae4dma_dmadev {
> +	void *io_regs;
> +	struct ae4dma_cmd_queue cmd_q; /* single HW queue owned by this dmadev */
> +};
> +
> +

two blank line

> +extern int ae4dma_pmd_logtype;
> +#define RTE_LOGTYPE_AE4DMA_PMD ae4dma_pmd_logtype
> +
> +#define AE4DMA_PMD_LOG(level, ...) \
> +	RTE_LOG_LINE_PREFIX(level, AE4DMA_PMD, "%s(): ", __func__, __VA_ARGS__)
> +
> +#define AE4DMA_PMD_DEBUG(...)  AE4DMA_PMD_LOG(DEBUG, __VA_ARGS__)
> +#define AE4DMA_PMD_INFO(...)   AE4DMA_PMD_LOG(INFO, __VA_ARGS__)
> +#define AE4DMA_PMD_ERR(...)    AE4DMA_PMD_LOG(ERR, __VA_ARGS__)
> +#define AE4DMA_PMD_WARN(...)   AE4DMA_PMD_LOG(WARNING, __VA_ARGS__)
> +
> +#endif /* _AE4DMA_INTERNAL_H_ */
> diff --git a/drivers/dma/ae4dma/meson.build b/drivers/dma/ae4dma/meson.build
> new file mode 100644
> index 0000000000..e48ab0d561
> --- /dev/null
> +++ b/drivers/dma/ae4dma/meson.build
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright 2024 Advanced Micro Devices, Inc. All rights reserved.

2024 -> 2026

Does this also support run BSD or Windows, if not please add following instruments:
if not is_linux
    build = false
    reason = 'only supported on Linux'
    subdir_done()
endif

> +
> +build = dpdk_conf.has('RTE_ARCH_X86')
> +reason = 'only supported on x86'
> +sources = files('ae4dma_dmadev.c')
> +deps += ['bus_pci', 'dmadev']
> diff --git a/drivers/dma/meson.build b/drivers/dma/meson.build
> index e0d94db967..c230ac5a06 100644
> --- a/drivers/dma/meson.build
> +++ b/drivers/dma/meson.build
> @@ -2,6 +2,7 @@
>  # Copyright 2021 HiSilicon Limited
>  
>  drivers = [
> +        'ae4dma',
>          'cnxk',
>          'dpaa',
>          'dpaa2',
> diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
> index 93f2383dff..7d09f155de 100755
> --- a/usertools/dpdk-devbind.py
> +++ b/usertools/dpdk-devbind.py
> @@ -86,6 +86,9 @@
>  cn9k_ree = {'Class': '08', 'Vendor': '177d', 'Device': 'a0f4',
>              'SVendor': None, 'SDevice': None}
>  
> +amd_ae4dma = {'Class': '08', 'Vendor': '1022', 'Device': '149b',
> +              'SVendor': None, 'SDevice': None}
> +
>  virtio_blk = {'Class': '01', 'Vendor': "1af4", 'Device': '1001,1042',
>                'SVendor': None, 'SDevice': None}
>  
> @@ -95,7 +98,7 @@
>  network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
>  baseband_devices = [acceleration_class]
>  crypto_devices = [encryption_class, intel_processor_class]
> -dma_devices = [cnxk_dma, hisilicon_dma,
> +dma_devices = [amd_ae4dma, cnxk_dma, hisilicon_dma,
>                 intel_idxd_gnrd, intel_idxd_dmr, intel_idxd_spr,
>                 intel_ioat_bdw, intel_ioat_icx, intel_ioat_skx,
>                 odm_dma]



More information about the dev mailing list