[dpdk-dev] [PATCH 2/2] event/sw: use dynamically-sized IQs

Pavan Nikhilesh pbhagavatula at caviumnetworks.com
Mon Jan 8 17:05:30 CET 2018


On Mon, Jan 08, 2018 at 03:50:24PM +0000, Van Haaren, Harry wrote:
> > From: Pavan Nikhilesh [mailto:pbhagavatula at caviumnetworks.com]
> > Sent: Monday, January 8, 2018 3:32 PM
> > To: Eads, Gage <gage.eads at intel.com>; Van Haaren, Harry
> > <harry.van.haaren at intel.com>; jerin.jacob at caviumnetworks.com;
> > santosh.shukla at caviumnetworks.com
> > Cc: dev at dpdk.org
> > Subject: Re: [PATCH 2/2] event/sw: use dynamically-sized IQs
> >
> > On Wed, Nov 29, 2017 at 09:08:34PM -0600, Gage Eads wrote:
> > > This commit introduces dynamically-sized IQs, by switching the underlying
> > > data structure from a fixed-size ring to a linked list of queue 'chunks.'
>
> <snip>
>
> > Sw eventdev crashes when used alongside Rx adapter. The crash happens when
> > pumping traffic at > 1.4mpps. This commit seems responsible for this.
> >
> >
> > Apply the following Rx adapter patch
> > http://dpdk.org/dev/patchwork/patch/31977/
> > Command used:
> > ./build/eventdev_pipeline_sw_pmd -c 0xfffff8 --vdev="event_sw" -- -r0x800
> > -t0x100 -w F000 -e 0x10
>
> Applied the patch to current master, recompiled; cannot reproduce here..
>
master in the sense dpdk-next-eventdev right?
> Is it 100% reproducible and "instant" or can it take some time to occur there?
>
It is instant
>
> > Backtrace:
> >
> > Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0xffffb6c8f040 (LWP 25291)]
> > 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38,
> > iq=0xffff9f764720, sw=0xffff9f332600) at
> > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142
> > 142 ev[total++] = current->events[index++];
>
> Could we get the output of (gdb) info locals?
>

Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffb6c8f040 (LWP 19751)]
0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38,
iq=0xffff9f764620, sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142
142 ev[total++] = current->events[index++];

(gdb) info locals
next = 0x7000041400be73b
current = 0x7000041400be73b
total = 36
index = 1
(gdb)


Noticed an other crash:

Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffb6c8f040 (LWP 19690)]
0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63
63		sw->chunk_list_head = chunk->next;

(gdb) info locals
chunk = 0x14340000119

(gdb) bt
#0  0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63
#1  iq_enqueue (ev=0xffff9f3967c0, iq=0xffff9f764620, sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:95
#2  __pull_port_lb (allow_reorder=0, port_id=5, sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev_scheduler.c:463
#3  sw_schedule_pull_port_no_reorder (sw=0xffff9f332500, port_id=5) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev_scheduler.c:486
#4  0x0000aaaaaadd0608 in sw_event_schedule (dev=0xaaaaaafbd200
<rte_event_devices>) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev_scheduler.c:554
#5  0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200
<rte_event_devices>) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:767
#6  0x0000aaaaaab54740 in rte_service_runner_do_callback (s=0xffff9fffdf80,
cs=0xffff9ffef900, service_idx=0) at
/root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/common/rte_service.c:349
#7  0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900,
service_mask=18446744073709551615) at
/root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/common/rte_service.c:376
#8  0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0,
serialize_mt_unsafe=1) at
/root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/common/rte_service.c:405
#9  0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at
/root/clean/rebase/dpdk-next-eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223
#10 0x0000aaaaaaaef234 in worker (arg=0xffff9f331c80) at
/root/clean/rebase/dpdk-next-eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274
#11 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at
/root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182
#12 0x0000ffffb7e46d64 in start_thread () from /usr/lib/libpthread.so.0
#13 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6


>
>
> > (gdb) bt
> > #0  0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38,
> > iq=0xffff9f764720, sw=0xffff9f332600) at
> > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142
> > #1  sw_schedule_atomic_to_cq (sw=0xffff9f332600, qid=0xffff9f764700,
> > iq_num=0,
> > count=48) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/drivers/event/sw/sw_evdev_scheduler.c:74
> > #2  0x0000aaaaaadcdc44 in sw_schedule_qid_to_cq (sw=0xffff9f332600) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/drivers/event/sw/sw_evdev_scheduler.c:262
> > #3  0x0000aaaaaadd069c in sw_event_schedule (dev=0xaaaaaafbd200
> > <rte_event_devices>) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/drivers/event/sw/sw_evdev_scheduler.c:564
> > #4  0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200
> > <rte_event_devices>) at
> > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:767
> > #5  0x0000aaaaaab54740 in rte_service_runner_do_callback (s=0xffff9fffdf80,
> > cs=0xffff9ffef900, service_idx=0) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/lib/librte_eal/common/rte_service.c:349
> > #6  0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900,
> > service_mask=18446744073709551615) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/lib/librte_eal/common/rte_service.c:376
> > #7  0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0,
> > serialize_mt_unsafe=1) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/lib/librte_eal/common/rte_service.c:405
> > #8  0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223
> > #9  0x0000aaaaaaaef234 in worker (arg=0xffff9f331d80) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274
> > #10 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182
> > #11 0x0000ffffb7e46d64 in start_thread () from /usr/lib/libpthread.so.0
> > #12 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6
> >
> > Segfault seems to happen in sw_event_schedule and only happens under high
> > traffic load.
>
> I've added -n 0 to the command line allowing it to run forever,
> and after ~2 mins its still happily forwarding pkts at ~10G line rate here.
>

On arm64 the crash is instant even without -n0.

>
> > Thanks,
> > Pavan
>
> Thanks for reporting - I'm afraid I'll have to ask a few questions to identify why I can't reproduce here before I can dig in and identify a fix.
>
> Anything special about the system that it is on?

Running on arm64 octeontx with 8x10G connected.

> What traffic pattern is being sent to the app?

Using something similar to trafficgen, IPv4/UDP pkts.

   0:00:51     958245 |0xB00   2816|0xB10   2832|0xB20   2848|0xB30   2864|0xC00 * 3072|0xC10 * 3088|0xC20 * 3104|0xC30 * 3120|    Totals
Port Status           |XFI30     Up|XFI31     Up|XFI32     Up|XFI33     Up|XFI40     Up|XFI41     Up|XFI42     Up|XFI43     Up|
 1:Total TX packets   |  7197041566|  5194976604|  5120240981|  4424870160|  5860892739|  5191225514|  5126500427|  4429259828|42545007819
 3:Total RX packets   |   358886055|   323055411|   321000948|   277179800|   387486466|   350278086|   348080242|   295460613|2661427621
 6:TX packet rate     |           0|           0|           0|           0|           0|           0|           0|           0|         0
 7:TX octet rate      |           0|           0|           0|           0|           0|           0|           0|           0|         0
 8:TX bit rate, Mbps  |           0|           0|           0|           0|           0|           0|           0|           0|         0
10:RX packet rate     |           0|           0|           0|           0|           0|           0|           0|           0|         0
11:RX octet rate      |           0|           0|           0|           0|           0|           0|           0|           0|         0
12:RX bit rate, Mbps  |           0|           0|           0|           0|           0|           0|           0|           0|         0
36:tx.size            |          60|          60|          60|          60|          60|          60|          60|          60|
37:tx.type            |    IPv4+UDP|    IPv4+UDP|    IPv4+UDP|    IPv4+UDP|    IPv4+UDP|    IPv4+UDP|    IPv4+UDP|    IPv4+UDP|
38:tx.payload         |         abc|         abc|         abc|         abc|         abc|         abc|         abc|         abc|
47:dest.mac           |   fb71189c0|   fb71189d0|   fb71189e0|   fb71189bf|   fb7118ac0|   fb7118ad0|   fb7118ae0|   fb7118abf|
51:src.mac            |   fb71189bf|   fb71189cf|   fb71189df|   fb71189ef|   fb7118abf|   fb7118acf|   fb7118adf|   fb7118aef|
55:dest.ip            |   11.1.0.99|  11.17.0.99|  11.33.0.99|   11.0.0.99|   14.1.0.99|  14.17.0.99|  14.33.0.99|   14.0.0.99|
59:src.ip             |   11.0.0.99|  11.16.0.99|  11.32.0.99|  11.48.0.99|   14.0.0.99|  14.16.0.99|  14.32.0.99|  14.48.0.99|
73:bridge             |         off|         off|         off|         off|         off|         off|         off|         off|
77:validate packets   |         off|         off|         off|         off|         off|         off|         off|         off|

Thanks,
Pavan.

>
> Thanks
>
>
> <snip>
>


More information about the dev mailing list