[dpdk-dev] [PATCH v4 2/2] Add l2reflect measurement application

Henning Schild henning.schild at siemens.com
Wed Sep 2 09:56:05 CEST 2020


Am Tue, 01 Sep 2020 17:07:38 +0200
schrieb Thomas Monjalon <thomas at monjalon.net>:

> 28/08/2020 16:22, Henning Schild:
> > Thomas,
> > 
> > what is the state on this one? Any more steps to take or people to
> > involve?  
> 
> I can try adding some key people in Cc list,
> while doing a quick review.
>
>
> > I first showed that in action back in 2016 on FOSDEM.
> > https://archive.fosdem.org/2016/schedule/event/virt_iaas_real_time_cloud/
> > After my talk two devs from dpdk approached me because they liked
> > the idea of a "network cyclictest". It took us some time to finally
> > go upstream with it, unfortunately i do not recall names.  
> 
> I think I was one of them.
> 
> There will be some talks about latency in the next virtual DPDK event:
> https://events.linuxfoundation.org/dpdk-userspace-summit/program/schedule/
> (speakers are Cc'ed)

Thanks, that sure looks like we are not the only ones who are
interested in latency and probably a way to measure it. From other
doing it as well it would be nice to hear how they currently do that.

> > This application can potentially be integrated into the test-suite
> > for QA, making sure no changes heavily increase the package
> > transmission worst case timing.  
> 
> Good
> 
> 
> > Felix Moessbauer <felix.moessbauer at siemens.com> wrote:
> >   
> > > The l2reflect application implements a ping-pong benchmark to
> > > measure the latency between two instances. For communication,
> > > we use raw ethernet and send one packet at a time. The timing data
> > > is collected locally and min/max/avg values are displayed in a
> > > TUI. Finally, a histogram of the latencies is printed which can be
> > > further processed with the jitterdebugger visualization scripts.  
> 
> Please can you show an example of script usage?
> 
> > > To debug latency spikes, a max threshold can be defined.
> > > If it is hit, a trace point is created on both instances.
> > > 
> > > Signed-off-by: Felix Moessbauer <felix.moessbauer at siemens.com>
> > > Signed-off-by: Henning Schild <henning.schild at siemens.com>  
> [...]
> > > --- a/app/Makefile
> > > +++ b/app/Makefile  
> 
> No need to update Makefile, it will be removed.
> 
> > > --- /dev/null
> > > +++ b/app/l2reflect/l2reflect.h
> > > @@ -0,0 +1,62 @@
> > > +/*
> > > + * SPDX-License-Identifier: BSD-3-Clause  
> 
> SPDX should be the first line.
> 
> > > + * Copyright(c) 2020 Siemens AG
> > > + *
> > > + * authors:
> > > + *   Felix Moessbauer <felix.moessbauer at siemens.com>
> > > + *   Henning Schild <henning.schild at siemens.com>  
> 
> It is uncommon to mention authors in the file.
> In general, git metadata covers this info.
> Any special requirement?

That application has some history. Written by me and eventually
improved for upstream by Felix. The idea here was to give credit to two
people with just one author field in git. If that is really in the way,
we can drop it and make Felix the sole author.

> [...]
> > > +
> > > +/*
> > > + * Compares a packet destination MAC address to a device MAC
> > > address.
> > > + */
> > > +static __rte_always_inline int
> > > +ether_addr_cmp(struct rte_ether_addr *ea, struct rte_ether_addr
> > > *eb) +{
> > > +	return ((*(uint64_t *)ea ^ *(uint64_t *)eb) &
> > > MAC_ADDR_CMP) == 0; +}  
> 
> Please use rte_is_same_ether_addr()
> 
> > > --- /dev/null
> > > +++ b/app/l2reflect/main.c
> > > @@ -0,0 +1,872 @@
> > > +/*
> > > + * SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2020 Siemens AG
> > > + *
> > > + * authors:
> > > + *   Felix Moessbauer <felix.moessbauer at siemens.com>
> > > + *   Henning Schild <henning.schild at siemens.com>
> > > + *
> > > + * launch (non-rt kernel): l2reflect --lcores 0 at 0,1 at 6 -n 1
> > > + * launch (rt kernel): l2reflect --lcores 0 at 0,1 at 6 -n 1 -- -P 50
> > > -r -l  
> 
> Would be better in a --help section or in the doc.
> 
> > > + *
> > > + * The l2reflect application implements a ping-pong benchmark to
> > > + * measure the latency between two instances. For communication,
> > > + * we use raw ethernet and send one packet at a time. The timing
> > > data
> > > + * is collected locally and min/max/avg values are displayed in a
> > > TUI.
> > > + * Finally, a histogram of the latencies is printed which can be
> > > + * further processed with the jitterdebugger visualization
> > > scripts.
> > > + * To debug latency spikes, a max threshold can be defined.
> > > + * If it is hit, a trace point is created on both instances.
> > > + */  
> [...]
> > > +__rte_format_printf(1, 0)
> > > +static void
> > > +trace_write(const char *fmt, ...)
> > > +{
> > > +	va_list ap;
> > > +	char buf[256];
> > > +	int n, err;
> > > +
> > > +	if (trace_fd == 0)
> > > +		trace_fd =
> > > open("/sys/kernel/debug/tracing/trace_marker",
> > > +				O_WRONLY);  
> 
> Why not using rte_trace framework?

I am not sure where rte_trace ends up eventually. But the idea here is
to have kernel traces when looking at latency peaks. You might just be
looking at an incorrect host tuning when you see a spike i.e. higher
prio threads on your possibly isolated core.
It is not tracing dpdk, but more how it fits in the overall picture.
 
> > > +	if (trace_fd < 0)
> > > +		return;
> > > +
> > > +	va_start(ap, fmt);
> > > +	n = vsnprintf(buf, 256, fmt, ap);
> > > +	va_end(ap);
> > > +
> > > +	err = write(trace_fd, buf, n);
> > > +	assert(err >= 1);
> > > +}  
> 
> [...]
> > > +static void
> > > +l2reflect_simple_forward(struct rte_mbuf *m)
> > > +{
> > > +	struct rte_ether_hdr *eth;
> > > +	struct my_magic_packet *pkt;
> > > +
> > > +	eth = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
> > > +	pkt = (struct my_magic_packet *)eth;
> > > +
> > > +	/* dst addr */
> > > +	rte_ether_addr_copy(&eth->s_addr, &eth->d_addr);
> > > +	/* src addr */
> > > +	rte_ether_addr_copy(&l2reflect_port_eth_addr,
> > > &eth->s_addr); +
> > > +	if ((eth->ether_type ==
> > > rte_cpu_to_be_16(ETHER_TYPE_L2REFLECT)) &&
> > > +	    (pkt->magic == MAGIC_TRACE_PAYLOAD)) {
> > > +		/* and the native one */
> > > +		trace_write("sending traced packet\n");
> > > +	}
> > > +
> > > +	l2reflect_send_packet(&m, l2reflect_port_number);
> > > +}  
> 
> If I understand well, this requires a special cabling?
> Would it be possible to measure latency of hardware NIC internal
> loopback or software PMD loopback?

Not sure. We always want to see the full picture ... i.e would also
want to see if a NIC is slow to put a packet on the "wire".
For testing we do use multi-port NICs with cable-loops. The other setup
is Qemus with the application inside and a dpdk-based vhost switch. But
that virtual setup is also not trivial. In fact we usually do make that
switch send the packets over the cable-loop to again cover the whole
way the packets go "in reality".

> [...]
> > > +	printf("from MAC address: %02X:%02X:%02X:%02X:%02X:%02X
> > > to"
> > > +			" %02X:%02X:%02X:%02X:%02X:%02X\n\n",
> > > +			eth->s_addr.addr_bytes[0],
> > > eth->s_addr.addr_bytes[1],
> > > +			eth->s_addr.addr_bytes[2],
> > > eth->s_addr.addr_bytes[3],
> > > +			eth->s_addr.addr_bytes[4],
> > > eth->s_addr.addr_bytes[5],
> > > +			eth->d_addr.addr_bytes[0],
> > > eth->d_addr.addr_bytes[1],
> > > +			eth->d_addr.addr_bytes[2],
> > > eth->d_addr.addr_bytes[3],
> > > +			eth->d_addr.addr_bytes[4],
> > > eth->d_addr.addr_bytes[5]);  
> 
> You could also use rte_ether_format_addr()
> 
> [...]
> > > +static inline unsigned int
> > > +l2reflect_rx_filter(
> > > +	struct rte_mbuf **bufs,
> > > +	unsigned int nb_rx,
> > > +	unsigned int data_only)
> > > +{
> > > +	struct rte_ether_hdr *eth;
> > > +	struct my_magic_packet *pkt;
> > > +	unsigned int i, ret;
> > > +
> > > +	ret = 0;
> > > +	for (i = 0; i < nb_rx; i++) {
> > > +		eth = rte_pktmbuf_mtod(bufs[i], struct
> > > rte_ether_hdr *);
> > > +		if (l2reflect_state != S_ELECT_LEADER &&
> > > +		    !ether_addr_cmp(&eth->s_addr,
> > > &l2reflect_remote_eth_addr))
> > > +			goto drop;
> > > +
> > > +		if (eth->ether_type !=
> > > rte_cpu_to_be_16(ETHER_TYPE_L2REFLECT))
> > > +			goto drop;
> > > +
> > > +		pkt = (struct my_magic_packet *)eth;
> > > +		if (data_only && (pkt->type != TRACE_TYPE_DATA &&
> > > +				  pkt->type != TRACE_TYPE_RSET &&
> > > +				  pkt->type != TRACE_TYPE_QUIT))
> > > +			goto drop;
> > > +
> > > +		bufs[ret++] = bufs[i];
> > > +		continue;
> > > +drop:
> > > +		rte_pktmbuf_free(bufs[i]);
> > > +	}
> > > +
> > > +	return ret;
> > > +}  
> 
> This function deserves some comments to explain the logic.
> 
> [...]
> > > --- /dev/null
> > > +++ b/app/l2reflect/meson.build
> > > @@ -0,0 +1,25 @@
> > > +# SPDX-License-Identifier: BSD-3-Clause
> > > +# Copyright(c) 2020 Siemens AG
> > > +
> > > +# meson file, for building this example as part of a main DPDK
> > > build. +cc = meson.get_compiler('c')  
> 
> Not sure you need that line.
> 
> > > +
> > > +cjson = dependency('libcjson', required: false)  
> 
> Some other parts require jansson:
> 	jansson = dependency('jansson', required: false)
> How libcjson is different? Would it be possible to unify dependency?
> 
> > > +if not is_linux
> > > +        build = false  
> 
> Why limiting to Linux?

Probably because of the RT stuff (sched param setting, mlock) and the
tracing. I would propose it like that and hope that others might ifdef
when enabling other OSs.

Maintaining low latency requires quite a bit of host tuning. This can
get especially tricky when consuming the whole CPU with polling, you
can starve your OS. The application alone does not do that for you, not
even on Linux. Like cyclictest or jitterdebugger i would see it as a
tool that requires knowledge to measure correctly. And it should
probably be added as such, with an integration into automated tests and
support for other OSs possibly coming in the future.

I hope that others doing dpdk for low latency on preempt-rt will find
it useful as is. They might have been looking for something like it,
possibly have something like it in their test-labs and did not publish
it yet.

Henning

> [...]
> > > +#define ANSI_BOLD_RED "\x1b[01;31m"
> > > +#define ANSI_BOLD_GREEN "\x1b[01;32m"
> > > +#define ANSI_BOLD_YELLOW "\x1b[01;33m"
> > > +#define ANSI_BOLD_BLUE "\x1b[01;34m"
> > > +#define ANSI_BOLD_MAGENTA "\x1b[01;35m"
> > > +#define ANSI_BOLD_CYAN "\x1b[01;36m"
> > > +#define ANSI_RESET "\x1b[0m"  
> 
> Not sure about using colors.
> If it really adds a value, I think it should be an option,
> automatically disabled if stdout is redirected.
> 
> [...]
> > > --- a/app/meson.build
> > > +++ b/app/meson.build
> > > @@ -8,6 +8,7 @@ endif
> > >  apps = [
> > >  	'pdump',
> > >  	'proc-info',
> > > +	'l2reflect',
> > >  	'test-acl',
> > >  	'test-bbdev',
> > >  	'test-cmdline',  
> 
> I think you should keep alphabetical ordering.
> 
> I'm not sure about adding this application in DPDK.
> I think we need some latency tools,
> but if it requires a specific setup, it could better fit in DTS.
> We need more opinions here.
> 
> 



More information about the dev mailing list