[dpdk-dev] [EXT] Re: [RFC] DPDK Trace support

Jerin Jacob Kollanukkaran jerinj at marvell.com
Mon Jan 13 13:04:34 CET 2020


> -----Original Message-----
> From: Ray Kinsella <mdr at ashroe.eu>
> Sent: Monday, January 13, 2020 4:30 PM
> To: Jerin Jacob Kollanukkaran <jerinj at marvell.com>; dpdk-dev
> <dev at dpdk.org>; dave at barachs.net
> Subject: [EXT] Re: [RFC] [dpdk-dev] DPDK Trace support
> 
> External Email
> 
> ----------------------------------------------------------------------
> Hi Jerin,

Hi Ray,

> 
> Any idea why lttng performance is so poor?

100ns is the expected number based on Lttng presentations.
Just the 100ns is for high-end x86 machines.
Here is the perf out. Looks like overhead is coming from ring buffer implementations
due to its features. More over for normal linux application case, 100ns may not be 
bad, Just that DPDK need more..

  45.07%  liblttng-ust.so.0.0.0             [.] lttng_event_reserve
  25.48%  liblttng-ust.so.0.0.0             [.] lttng_event_commit
   6.30%  calibrate                         [.] __event_probe__dpdk___zero_arg
   5.05%  calibrate                         [.] __worker_ZERO_ARG
   4.87%  liblttng-ust-tracepoint.so.0.0.0  [.] tp_rcu_read_lock_bp
   4.79%  liblttng-ust-tracepoint.so.0.0.0  [.] tp_rcu_read_unlock_bp
   4.43%  ld-2.29.so                        [.] _dl_tlsdesc_return
   1.94%  calibrate                         [.] plugin_getcpu
   1.42%  calibrate                         [.] plugin_read64
   0.65%  liblttng-ust-tracepoint.so.0.0.0  [.] tp_rcu_dereference_sym_bp

Note:
- Performance is even worse, we if don’t use snapshot mode and have DPDK plugin
For get_clock and get_cpu. There numbers are based on the optimizations provided
Lttng in the framework.

> I would have naturally gone there to benefit from the existing toolchain.

Yes. That’s reason why I started with Lttng. After the integration, the testpmd dipped
Performance. Then added following test case to verify the overhead.
https://github.com/jerinjacobk/lttng-overhead
 
> Have you looked at the FD.io logging/tracing infrastructure for inspiration?

Based on my understanding, VPP has VPP specific trace format, trace emitter and trace viewer.
Since Lttng uses CTF and it is open, We could leverage the open source viewers
And post processing tracing tools with CTF. Looks like High performance trace_emiiter
Is only the missing piece in Lttng for us.

Off Couse, We can use FD.IO logging documentation for reference.



> 
> Ray K
> 
> On 13/01/2020 10:40, Jerin Jacob Kollanukkaran wrote:
> > Hi All,
> >
> > I would like to add tracing support for DPDK.
> > I am planning to add this support in v20.05 release.
> >
> > This RFC attempts to get feedback from the community on
> >
> > a) Tracing Use cases.
> > b) Tracing Requirements.
> > b) Implementation choices.
> > c) Trace format.
> >
> > Use-cases
> > ---------
> > - Most of the cases, The DPDK provider will not have access to the DPDK
> customer applications.
> > To debug/analyze the slow path and fast path DPDK API usage from the
> > field, we need to have integrated trace support in DPDK.
> >
> > - Need a low overhead Fast path multi-core PMD driver
> > debugging/analysis infrastructure in DPDK to fix the functional and
> performance issue(s) of PMD.
> >
> > - Post trace analysis tools can provide various status across the
> > system such as cpu_idle() using the timestamp added in the trace.
> >
> >
> > Requirements:
> > -------------
> > - Support for Linux, FreeBSD and Windows OS
> > - Open trace format
> > - Multi-platform Open source trace viewer
> > - Absolute low overhead trace API for DPDK fast path tracing/debugging.
> > - Dynamic enable/disable of trace events
> >
> >
> > To enable trace support in DPDK, following items need to work out:
> >
> > a) Add the DPDK trace points in the DPDK source code.
> >
> > - This includes updating DPDK functions such as,
> > rte_eth_dev_configure(), rte_eth_dev_start(), rte_eth_dev_rx_burst() to emit
> the trace.
> >
> > b) Choosing suitable serialization-format
> >
> > - Common Trace Format, CTF, is an open format and language to describe
> trace formats.
> > This enables tool reuse, of which line-textual (babeltrace) and
> > graphical (TraceCompass) variants already exist.
> >
> > CTF should look familiar to C programmers but adds stronger typing.
> > See CTF - A Flexible, High-performance Binary Trace Format.
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__diamon.org_ctf_&d
> >
> =DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4
> wCNtTa4
> >
> UUKvcsvI&m=xnRsAfdFoEyF_20G98OkCz08C9v5tKxAPUrVQmQcUXg&s=k0FbD-
> lAnFNH9
> > qkmKI6-LX_OHFBmsxKwQio7eEModCM&e=
> >
> > c) Writing the on-target serialization code,
> >
> > See the section below.(Lttng CTF trace emitter vs DPDK specific CTF
> > trace emitter)
> >
> > d) Deciding on and writing the I/O transport mechanics,
> >
> > For performance reasons, it should be backed by a huge-page and write to file
> IO.
> >
> > e) Writing the PC-side deserializer/parser,
> >
> > Both the babletrace(CLI tool) and Trace Compass(GUI tool) support CTF.
> > See:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lttng.org_viewers
> >
> _&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0
> s5f4wCNt
> >
> Ta4UUKvcsvI&m=xnRsAfdFoEyF_20G98OkCz08C9v5tKxAPUrVQmQcUXg&s=GJ
> U1ogbBwJ
> > N320JxEY4iB4SXWVopIDkoIAtxrMaHK4E&e=
> >
> > f) Writing tools for filtering and presentation.
> >
> > See item (e)
> >
> >
> > Lttng CTF trace emitter vs DPDK specific CTF trace emitter
> > ----------------------------------------------------------
> >
> > I have written a performance evaluation application to measure the
> > overhead of Lttng CTF emitter(The fastpath infrastructure used by
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lttng.org_&d=DwIC
> >
> aQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtT
> a4UUKvc
> > svI&m=xnRsAfdFoEyF_20G98OkCz08C9v5tKxAPUrVQmQcUXg&s=Ea-
> LF8IytOCG48BPPU
> > _4ucf9tRLIFbnKXLsj5E-eiNw&e=  library to emit the trace)
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jerinj
> > acobk_lttng-
> 2Doverhead&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz
> >
> 6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=xnRsAfdFoEyF_20G98OkCz08C9v5t
> KxAPUr
> > VQmQcUXg&s=EUqCHc0znylADrKrGw9oLNew7ZqMQ2Dvi9T-t8NAmm8&e=
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jerinj
> > acobk_lttng-
> 2Doverhead_blob_master_README&d=DwICaQ&c=nKjWec2b6R0mOyPaz
> >
> 7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=xnRsAfdFoE
> yF_20G
> > 98OkCz08C9v5tKxAPUrVQmQcUXg&s=Q3fAN3dL_m44lSb5I5I4BG4zZ-
> IQQM44b160UXlI
> > JaA&e=
> >
> > I could improve the performance by 30% by adding the "DPDK"
> > based plugin for get_clock() and get_cpu(), Here are the performance
> > numbers after adding the plugin on
> > x86 and various arm64 board that I have access to,
> >
> > On high-end x86, it comes around 236 cycles/~100ns @ 2.4GHz (See the
> > last line in the log(ZERO_ARG)) On arm64, it varies from 312 cycles to 1100
> cycles(based on the class of CPU).
> > In short, Based on the "IPC capabilities", The cost would be around
> > 100ns to 400ns for single void trace(a trace without any argument)
> >
> >
> > [lttng-overhead-x86] $ sudo ./calibrate/build/app/calibrate -c 0xc0
> > make[1]: Entering directory '/export/lttng-overhead-x86/calibrate'
> > make[1]: Leaving directory '/export/lttng-overhead-x86/calibrate'
> > EAL: Detected 56 lcore(s)
> > EAL: Detected 2 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > EAL: Selected IOVA mode 'PA'
> > EAL: Probing VFIO support...
> > EAL: PCI device 0000:01:00.0 on NUMA socket 0
> > EAL:   probe driver: 8086:1521 net_e1000_igb
> > EAL: PCI device 0000:01:00.1 on NUMA socket 0
> > EAL:   probe driver: 8086:1521 net_e1000_igb
> > CPU Timer freq is 2600.000000MHz
> > NOP: cycles=0.194834 ns=0.074936
> > GET_CLOCK: cycles=47.854658 ns=18.405638
> > GET_CPU: cycles=30.995892 ns=11.921497
> > ZERO_ARG: cycles=236.945113 ns=91.132736
> >
> >
> > We will have only 16.75ns to process 59.2 mpps(40Gbps), So IMO, Lttng
> > CTF emitter may not fit the DPDK fast path purpose due to the cost
> associated with generic Lttng features.
> >
> > One option could be to have, native CTF emitter in EAL/DPDK to emit
> > the trace in a hugepage. I think it would be a handful of cycles if we
> > limit the features to the requirements above:
> >
> > The upside of using Lttng CTF emitter:
> > a) No need to write a new CTF trace emitter(the item (c))
> >
> > The downside of Lttng CTF emitter(the item (c))
> > a) performance issue(See above)
> > b) Lack of Windows OS support. It looks like, it has basic FreeBSD support.
> > c) dpdk library dependency to lttng for trace.
> >
> > So, Probably it good to have native CTF emitter in DPDK and reuse all
> > open-source trace viewer(babeltrace and  TraceCompass) and format(CTF)
> infrastructure.
> > I think, it would be best of both world.
> >
> > Any thoughts on this subject? Based on the community feedback, I can work
> on the patch for v20.05.
> >


More information about the dev mailing list