[RFC PATCH 0/5] Using shared mempools for zero-copy IO proxying
    Stephen Hemminger 
    stephen at networkplumber.org
       
    Fri Feb  7 02:55:42 CET 2025
    
    
  
On Fri, 22 Sep 2023 09:19:07 +0100
Bruce Richardson <bruce.richardson at intel.com> wrote:
> Following my talk at the recent DPDK Summit [1], here is an RFC patchset
> containing the prototypes I created which led to the talk.  This
> patchset is simply to demonstrate:
> 
> * what is currently possible with DPDK in terms of zero-copy IPC
> * where the big gaps, and general problem areas are
> * what the performance is like doing zero-copy between processes
> * how we may look to have new deployment models for DPDK apps.
> 
> This cover letter is quite long, as it covers how to run the demo app
> and use the drivers included in this set. I felt it more accessible this
> way than putting it in rst files in the patches. This patchset depends
> upon patchsets [2] and [3]
> 
> [1] https://dpdksummit2023.sched.com/event/1P9wU
> [2] http://patches.dpdk.org/project/dpdk/list/?series=29536
> [3] http://patches.dpdk.org/project/dpdk/list/?series=29538
> 
> Overview
> --------
> 
> The patchset contains at a high level the following parts: a proxy
> application which performs packet IO and steers traffic on a per-queue
> basis to other applications which connect to it via unix sockets, and a
> set of drivers to be used by those applications so that they can
> (hopefully) receive packets from the proxy app without any changes to
> their own code. This all helps to demonstrate the feasibility of zero-
> copy packet transfer between independent DPDK apps.
> 
> The drivers are:
> * a bus driver, which makes the connection to the proxy app via
>   the unix socket. Thereafter it accepts the shared memory from the
>   proxy and maps it into the running process for use for buffers and
>   rings etc. It also handled communication with the proxy app on behalf
>   of the other two drivers
> * a mempool driver, which simply manages a set of buffers on the basis
>   of offsets within the shared memory area rather than using pointers.
>   The big downside of its use is that it assumes all the objects stored
>   in the mempool are mbufs. (As described in my talk, this is a big
>   issue where I'm not sure we have a good solution available right now
>   to resolve it)
> * an ethernet driver, which creates an rx and tx ring in shared memory
>   for use in communicating with the proxy app. All buffers sent/received
>   are converted to offsets within the shared memory area.
> 
> The proxy app itself implements all the other logic - mostly inside
> datapath.c - to allow the connecting app to run. When an app connects to
> the unix socket, the proxy app uses memfd to create a hugepage block to
> be passed through to the "guest" app, and then sends/receives the
> messages from the drivers until the app connection is up and running to
> handle traffic. [Ideally, this IPC over unix socket mechanism should
> probably be generalized into a library used by the app, but for now it's
> just built-in]. As stated above, the steering of traffic is done
> per-queue, that is, each app connects to a specific socket corresponding
> to a NIC queue. For demo purposes, the traffic to the queues is just
> distributed using RSS, but obviously it would be possible to use e.g.
> rte_flow to do more interesting distribution in future.
> 
> Running the Apps
> ----------------
> 
> To get things all working just do a DPDK build as normal. Then run the
> io-proxy app. It only takes a single parameter of the core number to
> use. For example, on my system I run it on lcore 25:
> 
> 	./build/app/dpdk-io-proxy 25
> 
> The sockets to be created and how they map to ports/queues is controlled
> via commandline, but a startup script can be provided, which just needs
> to be in the current directory and name "dpdk-io-proxy.cmds". Patch 5 of
> this set contains an example setup that I use. Therefore it's
> recommended that you run the proxy app from a directory containing that
> file. If so, the proxy app will use two ports and create two queues on
> each, mapping them to 4 unix socket files in /tmp. (Each socket is
> created in its own directory to simplify use with docker containers as
> described below in next section).
> 
> No traffic is handled by the app until other end-user apps connect to
> it. Testpmd works as that second "guest" app without any changes to it.
> To run multiple testpmd instances, each taking traffic from a unique RX
> queue and forwarding it back, the following sequence of commands can be
> used [in this case, doing forwarding on cores 26 through 29, and using
> the 4 unix sockets configured using the startup file referenced above].
> 
> 	./build/app/dpdk-testpmd -l 24,26 --no-huge -m1 --no-shconf \
> 		-a sock:/tmp/socket_0_0/sock  -- --forward-mode=macswap
> 	./build/app/dpdk-testpmd -l 24,27 --no-huge -m1 --no-shconf \
> 		-a sock:/tmp/socket_0_1/sock  -- --forward-mode=macswap
> 	./build/app/dpdk-testpmd -l 24,28 --no-huge -m1 --no-shconf \
> 		-a sock:/tmp/socket_1_0/sock  -- --forward-mode=macswap
> 	./build/app/dpdk-testpmd -l 24,29 --no-huge -m1 --no-shconf \
> 		-a sock:/tmp/socket_1_1/sock  -- --forward-mode=macswap
> 
> NOTE:
> * the "--no-huge -m1" is present to guarantee that no regular DPDK
>   hugepage memory is being used by the app. It's all coming from the
>   proxy app's memfd
> * the "--no-shconf" parameter is necessary just to avoid us needing to
>   specify a unix file-prefix for each instance
> * the forwarding type to be used is optional, macswap is chosen just to
>   have some work done inside testpmd to prove it can touch the packet
>   payload, not just the mbuf header.
> 
> Using with docker containers
> ----------------------------
> 
> The testpmd instances run above can also be run within a docker
> container. Using a dockerfile like below we can run testpmd in a
> container getting the packets in a zero-copy manner from the io-proxy
> running on the host.
> 
>    # syntax=docker/dockerfile:1-labs
>    FROM alpine
>    RUN apk add --update alpine-sdk \
>            py3-elftools meson ninja \
>            bsd-compat-headers \
>            linux-headers \
>            numactl-dev \
>            bash
>    ADD . dpdk
>    WORKDIR dpdk
>    RUN rm -rf build
>    RUN meson setup -Denable_drivers=*/shared_mem -Ddisable_libs=* \
>         -Denable_apps=test-pmd -Dtests=false build
>    RUN ninja -v -C build
>    ENTRYPOINT ["/dpdk/build/app/dpdk-testpmd"]
> 
> To access the proxy, all the container needs is access to the unix
> socket on the filesystem. Since in the example startup script each
> socket is placed in its own directory we can use "--volume" parameter to
> give each instance it's own unique unix socket, and therefore proxied
> NIC RX/TX queue. To run four testpmd instances as above, just in
> containers the following commands can be used - assuming the dockerfile
> above is built to an image called "testpmd".
> 
> 	docker run -it --volume=/tmp/socket_0_0:/run testpmd \
> 		-l 24,26 --no-huge -a sock:/run/sock -- \
> 		--no-mlockall --forward-mode=macswap
> 	docker run -it --volume=/tmp/socket_0_1:/run testpmd \
> 		-l 24,27 --no-huge -a sock:/run/sock -- \
> 		--no-mlockall --forward-mode=macswap
> 	docker run -it --volume=/tmp/socket_1_0:/run testpmd \
> 		-l 24,28 --no-huge -a sock:/run/sock -- \
> 		--no-mlockall --forward-mode=macswap
> 	docker run -it --volume=/tmp/socket_1_1:/run testpmd \
> 		-l 24,29 --no-huge -a sock:/run/sock -- \
> 		--no-mlockall --forward-mode=macswap
> 
> NOTE: since these docker testpmd instances don't access IO or allocate
> hugepages directly, they should be runable without extra privileges, so
> long as they can connect to the unix socket.
> 
> Additional info
> ---------------
> 
> * Stats are available via app commandline
> * By default (#define in code), the proxy app only uses 2 queues per
>   port, so you can't configure more than that via cmdline
> * Any ports used by the proxy script must support queue reconfiguration
>   at runtime without stopping the port.
> * When a "guest" process connected to a socket terminates, all shared
>   memory used by that process is detroyed and a new memfd created on
>   reconnect.
> * The above setups using testpmd are the only ways in which this app and
>   drivers have been tested. I would be hopeful that other apps would
>   work too, but there are quite a few limitations (see my DPDK summit
>   talk for some more details on those).
> 
> Congratulations on reading this far! :-)
> All comments/feedback on this welcome.
> 
> Bruce Richardson (5):
>   bus: new driver to accept shared memory over unix socket
>   mempool: driver for mempools of mbufs on shared memory
>   net: new ethdev driver to communicate using shared mem
>   app: add IO proxy app using shared memory interfaces
>   app/io-proxy: add startup commands
> 
>  app/io-proxy/command_fns.c                 | 160 ++++++
>  app/io-proxy/commands.list                 |   6 +
>  app/io-proxy/datapath.c                    | 595 +++++++++++++++++++++
>  app/io-proxy/datapath.h                    |  37 ++
>  app/io-proxy/datapath_mp.c                 |  78 +++
>  app/io-proxy/dpdk-io-proxy.cmds            |   6 +
>  app/io-proxy/main.c                        |  71 +++
>  app/io-proxy/meson.build                   |  12 +
>  app/meson.build                            |   1 +
>  drivers/bus/meson.build                    |   1 +
>  drivers/bus/shared_mem/meson.build         |  11 +
>  drivers/bus/shared_mem/shared_mem_bus.c    | 323 +++++++++++
>  drivers/bus/shared_mem/shared_mem_bus.h    |  75 +++
>  drivers/bus/shared_mem/version.map         |  11 +
>  drivers/mempool/meson.build                |   1 +
>  drivers/mempool/shared_mem/meson.build     |  10 +
>  drivers/mempool/shared_mem/shared_mem_mp.c |  94 ++++
>  drivers/net/meson.build                    |   1 +
>  drivers/net/shared_mem/meson.build         |  11 +
>  drivers/net/shared_mem/shared_mem_eth.c    | 295 ++++++++++
>  20 files changed, 1799 insertions(+)
>  create mode 100644 app/io-proxy/command_fns.c
>  create mode 100644 app/io-proxy/commands.list
>  create mode 100644 app/io-proxy/datapath.c
>  create mode 100644 app/io-proxy/datapath.h
>  create mode 100644 app/io-proxy/datapath_mp.c
>  create mode 100644 app/io-proxy/dpdk-io-proxy.cmds
>  create mode 100644 app/io-proxy/main.c
>  create mode 100644 app/io-proxy/meson.build
>  create mode 100644 drivers/bus/shared_mem/meson.build
>  create mode 100644 drivers/bus/shared_mem/shared_mem_bus.c
>  create mode 100644 drivers/bus/shared_mem/shared_mem_bus.h
>  create mode 100644 drivers/bus/shared_mem/version.map
>  create mode 100644 drivers/mempool/shared_mem/meson.build
>  create mode 100644 drivers/mempool/shared_mem/shared_mem_mp.c
>  create mode 100644 drivers/net/shared_mem/meson.build
>  create mode 100644 drivers/net/shared_mem/shared_mem_eth.c
> 
> --
> 2.39.2
> 
This looked interesting but appears to be a dead end.
No more work, and never clear how it was different from memif.
Would need more documentation etc to be a real NIC.
If there is still interest resubmit it.
    
    
More information about the dev
mailing list