[RFC v3 0/2] An API for Stashing Packets into CPU caches
Stephen Hemminger
stephen at networkplumber.org
Tue Oct 22 03:12:51 CEST 2024
On Mon, 21 Oct 2024 01:52:44 +0000
Wathsala Vithanage <wathsala.vithanage at arm.com> wrote:
> DPDK applications benefit from Direct Cache Access (DCA) features like
> Intel DDIO and Arm's write-allocate-to-SLC. However, those features do
> not allow fine-grained control of direct cache access, such as stashing
> packets into upper-level caches (L2 caches) of a processor or the shared
> cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses this need
> in a vendor-agnostic manner. TPH capability has existed since
> PCI Express Base Specification revision 3.0; today, numerous Network
> Interface Cards and interconnects from different vendors support TPH
> capability. TPH comprises a steering tag (ST) and a processing hint
> (PH). ST specifies the cache level of a CPU at which the data should be
> written to (or DCAed into), while PH is a hint provided by the PCIe
> requester to the completer on an upcoming traffic pattern. Some NIC
> vendors bundle TPH capability with fine-grained control over the type of
> objects that can be stashed into CPU caches, such as
>
> - Rx/Tx queue descriptors
> - Packet-headers
> - Packet-payloads
> - Data from a given offset from the start of a packet
>
> Note that stashable object types are outside the scope of PCIe standard;
> therefore, vendors could support any combination of the above items as
> they see fit.
>
> To enable TPH and fine-grained packet stashing, this API extends the
> ethdev library, PCI library, and the PCI driver. In this design, the
> application via the ethdev stashing API provides hints to the PMD to
> indicate the underlying hardware at which processor and cache level it
> prefers a packet to end up. Once the PMD receives a CPU and a
> cache-level combination, it must extract the matching ST from the TPH
> ACPI _DSM of the PCIe root port to which the NIC is connected. To
> facilitate the extraction of STs, the PCI library and the PCI driver
> APIs are extended.
There is a fundamental conflict with the increasing growth of "nerd knobs"
like this in the DPDK. Users already have problems understanding DPDK
and adding more complexity does not help.
So any new feature like this should be:
1. Just work right without any configuration. It can't suck by default.
2. The API's should be used in the drivers and core, not exposed up
to the application. Most of the hot data structures are in the
drivers now.
3. Fit into existing API models. Like rte_prefetch().
Is the goal of DPDK enabling high speed applications, or enabling vendor
benchmarks?
More information about the dev
mailing list