[dpdk-dev] [RFC 0/4] DPDK multiprocess rework

Anatoly Burakov anatoly.burakov at intel.com
Fri May 19 18:39:42 CEST 2017

This is a proof-of-concept proposal for rework of how DPDK secondary processes
work. While the code has some limitations, it works well enough to demonstrate
the concept, and it can successfully run all existing multiprocess applications.

Current problems with DPDK secondary processes:
* ASLR interferes with mappings
  * "Fixed" by disabling ASLR, but not really a solution
* Secondary process may map things into where we want to map shared memory
  * _Almost_ works with --base-virtaddr, but unreliable and tedious
* Function pointers don't work (so e.g. hash library is broken)

Proposed solution:

Instead of running secondary process and mapping resources from primary process,
the following is done:
0) compile all applications as position-indendent executables, compile DPDK as
   a shared library
1) fork() from primary process
2) dlopen() secondary process binary
3) use dlsym() to find entry point
4) run the application code while having all resources already mapped

* No more ASLR issues
* No need for --base-virtaddr
* Function pointers from primary process will work in secondaries
  * Hash library (and any other library that uses function pointers internally)
    will work correctly in multi-process scenario
  * ethdev data can be moved to shared memory
  * Primary process interrupt callbacks can be run by secondary process
* More secure as all applications are compiled as position-indendent binaries
  (default on Fedora)

Potential drawbacks (that we could think of):
* Kind of a hack
* Puts some code restrictions on secondary processes
  * Anything happening before EAL init will be run twice
* Some use cases are no longer possible (attaching to a dead primary)
* May impact binaries compiled to use a lot (kilobytes) of thread-local storage[1]
* Likely wouldn't work for static linking

There are also a number of issues that need to be resolved, but those are
implementation details and are out of scope for RFC.

What is explicitly out of scope:
* Fixing interrupts in secondary processes
* Fixing hotplug in secondary processes

These currently do not work in secondary processes, and this proposal does
nothing to change that. They are better addressed using dedicated EAL-internal
IPC proposal.

Technical nitty-gritty

Things quickly get confusing, so terminology:
- Original Primary is normal DPDK primary process
- Forked Primary is a "clean slate" primary process, from which all secondary
  processes will be forked (threads and fork don't mix well, so fork is done
  after all the hugepage and PCI data is mapped, but before all the threads are
  spun up)
- Original Secondary is a process that connects to Forked Primary, sends some
  data and and triggers a fork
- Forked Secondary is _actual_ secondary process (forked from Forked Primary)

- Original Primary starts
- Forked Primary is forked from Original Primary
- Original Secondary starts and connects to Forked Primary
- Forked Primary forks into Forked Secondary
- Original Secondary waits until Forked Secondary dies

During EAL init, Original Primary does a fork() to form a Forked Primary - a
"clean slate" starting point for secondary processes. Forked Primary opens a
local socket (a-la VFIO) and starts listening for incoming connections.

Original Secondary process connects to Forked Primary, sends stdout/log fd's,
command line parameters, etc. over local socket, and sits around waiting for
Forked Secondary to die, then exits (Original Secondary does _not_ map anything
or do any EAL init, it rte_exit()'s from inside rte_eal_init()). Forked
Secondary process then executes main(), passing all command-line arguments, and
execution of secondary process resumes.

Why pre-fork and not pthread like VFIO?

Pthreads and fork() don't mix well, because fork() stops the world (all threads
disappear, leaving behind thread stacks, locks and possibly inconsistent state
of both app data and system libraries). On the other hand, forking from single-
threaded context is safe. Current implementation doesn't _exactly_ fork from a
single-threaded context, but this can be fixed later by rearranging EAL init.

[1]: https://www.redhat.com/archives/phil-list/2003-February/msg00077.html

Anatoly Burakov (4):
  vfio: refactor sockets into separate files
  eal: enable experimental dlopen()-based secondary process support
  apps: enable new secondary process support in multiprocess apps
  mk: default to compiling shared libraries

 config/common_base                                 |   2 +-
 .../client_server_mp/mp_client/Makefile            |   2 +-
 examples/multi_process/simple_mp/Makefile          |   2 +-
 examples/multi_process/symmetric_mp/Makefile       |   2 +-
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +
 lib/librte_eal/linuxapp/eal/eal.c                  | 105 ++++-
 lib/librte_eal/linuxapp/eal/eal_mp.h               |  54 +++
 lib/librte_eal/linuxapp/eal/eal_mp_primary.c       | 477 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_mp_secondary.c     | 301 +++++++++++++
 lib/librte_eal/linuxapp/eal/eal_mp_socket.c        | 301 +++++++++++++
 lib/librte_eal/linuxapp/eal/eal_mp_socket.h        |  54 +++
 lib/librte_eal/linuxapp/eal/eal_vfio.c             |  20 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.h             |  24 +-
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c     | 243 ++---------
 14 files changed, 1347 insertions(+), 243 deletions(-)
 create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp.h
 create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp_primary.c
 create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp_secondary.c
 create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp_socket.c
 create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp_socket.h


More information about the dev mailing list