[dpdk-dev] [PATCH] build: avoid --as-needed as it causes overlinking

Christian Ehrhardt christian.ehrhardt at canonical.com
Thu Aug 29 17:25:12 CEST 2019


On Thu, Aug 29, 2019 at 12:18 PM Christian Ehrhardt
<christian.ehrhardt at canonical.com> wrote:
>
> On Wed, Aug 28, 2019 at 5:23 PM Aaron Conole <aconole at redhat.com> wrote:
> >
> > Aaron Conole <aconole at redhat.com> writes:
> >
> > > Christian Ehrhardt <christian.ehrhardt at canonical.com> writes:
> > >
> > >> On Wed, Aug 28, 2019 at 3:53 PM Aaron Conole <aconole at redhat.com> wrote:
> > >>>
> > >>> Christian Ehrhardt <christian.ehrhardt at canonical.com> writes:
> > >>>
> > >>> > A while ago telemetry was added in 57ae0ec6 and it also added as-needed
> > >>> > to config/meson.build. This seems no more needed these days as due to other
> > >>> > build changes the ordering in buildlogs is:
> > >>> >   [...] -lrte_telemetry [...] -Wl,--no-as-needed [...]
> > >>> > Which means telemetry no more benefits from --no-as-needed anyway.
> > >>> >
> > >>> > Overlinking problems get triggered by the meson generated pkgconfig which
> > >>> > will have:
> > >>> >    [...] -Wl,--no-as-needed <somelibsusedbydpdk>
> > >>> > This will overlink <somelibs> and in addition anything that follows
> > >>> > as it also doesn't wrap back to --as-needed. So if a projects includes
> > >>> > dpdk libs + <other> it will also consider <other> with --no-as-needed.
> > >>> >
> > >>> > Fixes: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1841759
> > >>> >
> > >>> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt at canonical.com>
> > >>> > ---
> > >>>
> > >>> Hi Christian,
> > >>>
> > >>> I agree this is something to be fixed.  It will need additional work,
> > >>> though:
> > >>>
> > >>>   https://travis-ci.com/ovsrobot/dpdk/builds/124909245
> > >>>
> > >>
> > >> Thanks for the Link Aaron, yet I'm puzzled what to do there atm.
> > >>
> > >> The kind of error I found in the failing logs were misleading at first:
> > >> - linker can't find -lvirt / -lpqos / ...
> > >>   well the test env needs to install them, maybe it was added as
> > >> dependency by accident before?
> > >
> > > Not sure about this.  It's strange to require that we *install* the
> > > libraries before we can unit test them.  After all, if I'm going to
> > > potentially replace my previously installed libraries, I definitely want
> > > to know that the unit tests are passing.
> > >
> > >>   I'd understand (due to the change) if it would complain about missing symbols
> > >>   (no more added due to as-needed, but then for some reason needed)
> > >>   But this is vice versa, it just doesn't find the libs in the build env
> > >> - error: unrecognized command line option '-Wformat-truncation'
> > >>   I don't see how I'd cause this ...
> > >> => Maybe this is just an artifact that is even part of the normal/good tests?
> > >
> > > I don't think so - but there's a simple change.  I've pushed to my own
> > > branch and you can see the builds:
> > >
> > >   https://travis-ci.org/orgcandman/dpdk/branches using the same
> > >   series_6154 branch name.
> > >
> > >> Comparing former logs - last good test was
> > >> https://travis-ci.com/ovsrobot/dpdk/builds/124875383
> > >> This first seemed more helpful.
> > >>
> > >> DPDK:fast-tests / eal_flags_w_opt_autotest  FAIL
> > >> DPDK:fast-tests / func_reentrancy_autotest  FAIL
> > >> DPDK:fast-tests / mbuf_autotest         FAIL
> > >> DPDK:fast-tests / mempool_autotest      FAIL
> > >> DPDK:fast-tests / ring_pmd_autotest     FAIL
> > >> DPDK:fast-tests / sched_autotest        FAIL
> > >> DPDK:fast-tests / table_autotest        FAIL
> > >> [...]
> > >> Overall about 14/60 of the tests failed with no recognizable pattern
> > >> why just those and not the others.
> > >
> > > Good question :)
> > >
> > >> I only see "Full log written ... on_error", so I can't directly
> > >> compare how a good run would look in the configure/build stage.
> > >> Looking just at the bad case there are plenty of messages like
> > >> - "no available hugepages"
> > >> - "cannot reserve memory", ..
> > >> But all those indicate more a flaky test(-env) than an error in the
> > >> commit, there must be more to it.
> > >
> > > Okay.  Fair enough.
> > >
> > >> @Aaron is there a good way to get the rest of the log for a good case
> > >> to compare?
> > >
> > > Let's wait for https://travis-ci.org/orgcandman/dpdk/builds/577910388 to
> > > spit out some details.
> >
> > Oops - forgot to push the revert:
> >
> > https://travis-ci.org/orgcandman/dpdk/builds/577918381 is the correct
> > build.
> >
> > Sorry.
>
> Thanks Aaron, breaking down the actual different test results ...
>
> Tests: eal_flags_w_opt_autotest eal_flags_b_opt_autotest
>   EAL: failed to parse device "00FF:09:0B.3"
>   EAL: Unable to parse device '00FF:09:0B.3'
>
> Test: func_reentrancy_autotest
>   mempool create/lookup: common object allocated 0 times (should be 1)
>
> Tests: flow_classify_autotest ring_pmd_autotest sched_autotest
> table_autotest bitratestats_autotest distributor_autotest
> latencystats_autotest reorder_autotest
>   MBUF: error setting mempool handler
>   Cannot init mbuf pool on socket 0
>
> Test: mbuf_autotest
>   MBUF: error setting mempool handler
>   cannot allocate mbuf pool
>
> Test: mempool_autotest
>   cannot allocate mp_nocache mempool
>
> Test: eventdev_common_autotest
>   Tests not executed
>
>
> I built DPDK off of my commit locally and tried to run the tests, but
> they work fine
>
>
> # app/test/dpdk-test --no-huge -l 0-1 --pci-whitelist 00FF:09:0B.3
> EAL: Detected 4 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Static memory layout is selected, amount of reserved memory can
> be adjusted with -m or --socket-mem
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: Probing VFIO support...
> EAL:   cannot open VFIO container, error 2 (No such file or directory)
> EAL: VFIO support could not be initialized
> APP: HPET is not enabled, using TSC as default timer
>
> Works just fine ?!
> And if I really break it it fails as expected
>
> # app/test/dpdk-test --no-huge -l 0-1 --pci-whitelist 00FF:09:0B.3xxxx
> EAL: Detected 4 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Static memory layout is selected, amount of reserved memory can
> be adjusted with -m or --socket-mem
> EAL: failed to parse device "00FF:09:0B.3xxxx"
> EAL: Unable to parse device '00FF:09:0B.3xxxx'
>
> And trying another one of the cases that are most common "error
> setting mempool handler" works fine as well
>
> # app/test/dpdk-test --no-huge -l 0-1
> EAL: Detected 4 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Static memory layout is selected, amount of reserved memory can
> be adjusted with -m or --socket-mem
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: Probing VFIO support...
> EAL:   cannot open VFIO container, error 2 (No such file or directory)
> EAL: VFIO support could not be initialized
> EAL: PCI device 0000:00:1f.6 on NUMA socket -1
> EAL:   Invalid NUMA socket, default to 0
> EAL:   probe driver: 8086:15d8 net_e1000_em
> APP: HPET is not enabled, using TSC as default timer
> RTE>>flow_classify_autotest
> Created table_acl for for IPv4 five tuple packets
> Allocated mbuf pool on socket 0
> Set up IPv4 UDP traffic
> ETH  pktlen 14
> ETH + IPv4 pktlen 34
> ETH + IPv4 + UDP pktlen 42
>
> Set up IPv4 TCP traffic
> ETH  pktlen 14
> ETH + IPv4 pktlen 34
> ETH + IPv4 + TCP pktlen 54
>
> Set up IPv4 SCTP traffic
> ETH  pktlen 14
> ETH + IPv4 pktlen 34
> ETH + IPv4 + SCTP pktlen 42
>
> Test OK
>
>
> /me is still not seeing what would be wrong with it in this test environment :-/
> Especially since the change is on linking, that should be a do-or-die
> breakage and not a (what seems random) subset of test fails.
>
> @Aaron, thanks for your help so far. Is there an easy way to run
> exactly my code once again to see it if works this time?
> Or was https://travis-ci.org/orgcandman/dpdk/builds/577910388 exactly
> that already (and again failed at the same tests).
>

Thanks to Luca's support I set up travis for my github and
experimented with some ideas we had.
While I still don't understand how exactly we break those tests in
particular, we have found a way to fix the generated pkg-config
without breaking the tests:

Bad: https://travis-ci.org/cpaelzer/dpdk/builds/578391327
Good: https://travis-ci.org/cpaelzer/dpdk/builds/578370028

I'll submit a v2 to this mail in a few minutes.


More information about the dev mailing list