[25.11 PATCH v3 0/5] Introduce DMA enqueue/dequeue operations
Pavan Nikhilesh Bhagavatula
pbhagavatula at marvell.com
Wed Oct 8 08:01:23 CEST 2025
@Fengchengwen, @Bruce, @Kevin
Kindly review, since this is a library change we have to merge this before rc1.
Thanks,
Pavan.
>>> Hi Bruce,
>>>
>>> >On Sat, May 24, 2025 at 02:43:10PM +0530, <pbhagavatula at marvell.com> wrote:
>>> >> From: Pavan Nikhilesh <pbhagavatula at marvell.com>
>>> >>
>>> >> Introduce DMA enqueue/dequeue operations to the DMA device library.
>>> >>
>>> >> Add configuration flags to rte_dma_config instead of boolean for
>>> >> individual features.
>>> >>
>>> >> The enqueue/dequeue operations allow applications to communicate with the
>>> >> DMA device using the rte_dma_op structure, providing a more flexible and
>>> >> efficient way to manage DMA operations.
>>> >>
>>> >
>>> >While I have no really strong objections to this addition to the dmadev
>>> >API, I'd appreciate if you could explain WHY or how this method of working
>>> >is more efficient in your usecase? When designing the dmadev APIs
>>> >originally, we looked at using both an enqueue-type API as well as the
>>> >implemented individual-op-based APIs. IIRC at that time testing showed that
>>> >using the single ops directly was faster than using the enqueue APIs, so
>>> >I'm wondering what exactly has changed, or is different about your usecase?
>>> >
>>>
>>> Here is an example where we see enqueue/dequeue ops to be useful especially when
>>> integrating with Graph library.
>>>
>>> We had to write an entire wrapper[1] for tracking sges with the current implementation
>>> making our nodes[2] very complex.
>>>
>>
>>Can you explain a bit more here. Why do you need the wrapper rather than
>>just tracking in a circular ring all the copies offloaded? How does having
>>an enqueue API make this better?
>
>This is what we already do in our wrapper.
>We found it unnecessary overhead since, the driver already does this internally
>and we can leverage the existing functionality.
>This also reduces the memory footprint as in the case below we use a lot of VCHANS.
>
>Instead of checking for completions and maintaining the circular ring, we can spend
>those cycles doing other things in the application.
>
>>Can you perhaps give a trivial example
>>showing the difference it makes here? The examples you give below are
>>rather long to understand quickly.
>>
>
>The example below is a graph based application which currently uses the wrapper implementation.
>Which we want to swap with enq/deq ops to reduce overhead.
>
>Also, the ops descriptor already existes for eventdev subsystem, we are just importing it to DMA
>device and reusing it.
>
>>Thanks,
>>/Bruce
>>
>>> [1]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MarvellEmbeddedProcessors_dao_blob_dao-2Ddevel_lib_common_dao-5Fdma.h&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=E3SgYMjtKCMVsB-fmvgGV3o-g_fjLhk5Pupi9ijohpc&m=dXtUywAGV8Rir_dtqGP5J-tvRAxN9zQjmM96PeDo6Ke6QybID8eLdPbVwWzlgZFy&s=QryV2vh2_mWEz5yS37615Xb1F6B-gQZHM1uZ3badxoU&e=>
>>> [2]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MarvellEmbeddedProcessors_dao_blob_3f364261de91e355699bd9af20d60ea6459f7d67_lib_virtio-5Fnet_virtio-5Fnet-5Fdeq-5Fext.c-23L51&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=E3SgYMjtKCMVsB-fmvgGV3o-g_fjLhk5Pupi9ijohpc&m=dXtUywAGV8Rir_dtqGP5J-tvRAxN9zQjmM96PeDo6Ke6QybID8eLdPbVwWzlgZFy&s=Bl2X7g7xXg_XrWvVIjPhMuIZuy3PG7tOM-Eje9i2ITA&e=>
>>>
>>> >/Bruce
>>>
>>> Thanks,
>>> Pavan.
>>>
>
More information about the dev
mailing list