[dpdk-dev] DPDK 21.11 NVIDIA Mellanox Roadmap
shys at nvidia.com
Mon Sep 13 17:35:31 CEST 2021
Below is NVIDIA Mellanox's roadmap for DPDK21.11, on which we are currently working:
ethdev new APIs:
 Introduce an optimization in memory/performance for the case of scaled-up interfaces.
Motivation: An application (e.g. OVS) polls all representors queues. Each queue contains descriptors, and each descriptor is utilizing mbufs. As the number of interfaces grows (e.g. 1k Scalable Functions(SFs) ), the memory footprint grows dramatically (#queues X depth_of_queue X mbufs_memory X 1k ports), and CPU usage becomes inefficient, due to cache evictions between the queue contexts. The new optimization will aggregate the queues into a single one. It will reduce the number of entities to poll as well as reduce the memory footprint, allowing streamlined and efficient processing with much less cache evictions.
rte_flow new APIs:
 Extend rte_flow api to support the definition of flexible parsers.
Motivation: NVIDIA Mellanox NICs supports flexible parser configuration, and we've made use of that capability within the mlx5 PMD before. Now we are exposing an API to allow applications to configure the NIC to support matching over custom/non-supported protocol. With that configuration done, matching can be applied to traffic using that protocol.
mlx5 PMD updates:
mlx5 PMD will support the rte_flow update changes listed above and below
Extend mlx5 PMD capability to support up to 512 interfaces(VFs,SFs)
Motivation: Allow applications like VDPA to utilize larger number of interfaces. Another example would be in the DPU in which hundreds of applications can be supported using SFs
 Improve memory registration and sharing between drivers
Motivation: In a Data Processing Unit (DPU) environment, there's a need to share data between the host memory and the DPU/arm memory to facilitate fast data transfer of different drivers like regex and network that operates on the same physical device. For that, we are refactoring the memory registration and sharing method so that the memory region registration will be abstracted through that method (not left for each driver to do) which will enable sharing of a memory region between host and DPU/arm memory subset. Together with this change, wewill also optimize the huge page initialization and cross NUMA memory registration to speed up application start-up time.
testpmd updated to support the changes listed above
More information about the dev