[dpdk-dev] 18.08 build error on ppc64el - bool as vector type

Christian Ehrhardt christian.ehrhardt at canonical.com
Tue Aug 28 13:30:12 CEST 2018


On Mon, Aug 27, 2018 at 2:22 PM Adrien Mazarguil <adrien.mazarguil at 6wind.com>
wrote:

> Hi Christian,
>
> On Wed, Aug 22, 2018 at 05:11:41PM +0200, Christian Ehrhardt wrote:
> > Just FYI the simple change hits similar issues later on.
> >
> > The (not really) proposed patch would have to be extended to be as
> > following.
> > We really need a better solution (or somebody has to convince me that my
> > change is better than a band aid).
>
> Thanks for reporting. I've made a quick investigation on my own and believe
> it's a toolchain issue which may affect more than this PMD; potentially all
> users of stdbool.h (C11) on this platform.
>

Yeah I assumed as much, which is why I was hoping that some of the arch
experts would jump in and say "yeah this is a common thing and correctly
handled like <FOO>"
I'll continue trying to reach out to people that should know better still
...


> C11's stdbool.h defines a bool macro as _Bool (big B) along with
> true/false. On PPC targets, another file (altivec.h) defines bool as _bool
> (small b) but not true/false:
>
>  #if !defined(__APPLE_ALTIVEC__)
>  /* You are allowed to undef these for C++ compatibility.  */
>  #define vector __vector
>  #define pixel __pixel
>  #define bool __bool
>  #endif
>
> mlx5_nl.c explicitly includes stdbool.h to get the above definitions then
> includes mlx5.h -> rte_ether.h -> ppc_64/rte_memcpy.h -> altivec.h.
>
> For some reason the conflicting bool redefinition doesn't seem to raise any
> warnings, but results in mismatching bool and true/false definitions; an
> integer value cannot be assigned to a bool variable anymore, hence the
> build
> failure.
>
> The inability to assign integer values to bool is, in my opinion, a
> fundamental issue caused by altivec.h. If there is no way to fix this on
> the
> system, there are a couple of workarounds for DPDK, by order of preference:
>
> 1. Always #undef bool after including altivec.h in
>    lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h. I do not think
>    anyone expects this type to be unusable with true/false or integer
> values
>    anyway. The version of altivec.h I have doesn't rely on this macro at
>    all so it's probably not a big loss.
>

The undef of a definition in header A by hedaer B can lead to most
interesting, still broken effects.
If e.g. one does
#include <stdbool.h>
#include "mlx5.h"

or similar then it would undefine that of stdbool as well right?
In any case, the undefine not only would be suspicious it also fails right
away:

In file included from
/home/ubuntu/deb_dpdk/lib/librte_eal/common/malloc_heap.c:27:
/home/ubuntu/deb_dpdk/lib/librte_eal/common/eal_memalloc.h:30:15:
error: unknown
type name ‘bool’; did you mean ‘_Bool’?
  int socket, bool exact);
              ^~~~
              _Bool
[...]



>    Ditto for "pixel" and "vector" keywords. Alternatively you could #define
>    __APPLE_ALTIVEC__ before including altivec.h to prevent them from
> getting
>    defined in the first place.
>

Interesting I got plenty of these:
In file included from
/home/ubuntu/deb_dpdk/lib/librte_eal/common/eal_common_options.c:25:
/home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_memcpy.h:39:
warning:
"__APPLE_ALTIVEC__" redefined
#define __APPLE_ALTIVEC__

With a few of it being even errors, but the position of the original define
is interesting.
/home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_memcpy.h:39: error:
"__APPLE_ALTIVEC__" redefined [-Werror]
#define __APPLE_ALTIVEC__
<built-in>: note: this is the location of the previous definition

So if being a built-in, shouldn't it ALWAYS be defined and never
over-declare the bool type?

Checking GCC on the platform:
$ gcc -dM -E - < /dev/null | grep ALTI
#define __ALTIVEC__ 1
#define __APPLE_ALTIVEC__ 1


I added an #error in the header and dropped all dpdk changes.
if !defined(__APPLE_ALTIVEC__)
/* You are allowed to undef these for C++ compatibility.  */
#error WOULD REDECLARE BOOL
#define vector __vector

And I get:
gcc -Wp,-MD,./.mlx4.o.d.tmp -Wdate-time -D_FORTIFY_SOURCE=2 -m64 -pthread
  -DRTE_MACHINE_CPUFLAG_PPC64 -DRTE_MACHINE_CPUFLAG_ALTIVEC
-DRTE_MACHINE_CPUFLAG_VSX  -I/home/ubuntu/deb_dpdk/debia
n/build/static-root/include -include
/home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_config.h -O3
-std=c11 -Wall -Wextra -g -I. -D_BSD_SOURCE -D_DEFAULT_SOURCE
-D_XOPEN_SOURCE=600
-W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations
-Wold-style-definition -Wpointer-arith -Wcast-align -Wnested-externs
-Wcast-qual -Wformat-nonliteral -Wformat-securi
ty -Wundef -Wwrite-strings -Wdeprecated -Wimplicit-fallthrough=2
-Wno-format-truncation -DALLOW_EXPERIMENTAL_API -Wno-error=cast-qual
-DNDEBUG -UPEDANTIC   -g -g -o mlx4.o -c /home/ubuntu/de
b_dpdk/drivers/net/mlx4/mlx4.c
In file included from
/home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_memcpy.h:39,
                from
/home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_ether.h:21,
                from
/home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_ethdev.h:158,
                from
/home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_ethdev_driver.h:18,

                from /home/ubuntu/deb_dpdk/drivers/net/mlx4/mlx4.c:35:
/usr/lib/gcc/powerpc64le-linux-gnu/8/include/altivec.h:44:2: error: #error
WOULD REDECLARE BOOL
#error WOULD REDECLARE BOOL

But a quick sanity test with a hello world including this special altivec
header did build not reaching the #error.
So something in DPDK undefines __ALTIVEC__ ?!

After being that close I found which of our usual build does the switch to
trigger this.
It is "-std=c11"

And in fact
$ gcc -std=c11 -dM -E - < /dev/null | grep ALTI
#define __ALTIVEC__ 1

But no __APPLE_ALTIVEC__

The header says
/* You are allowed to undef these for C++ compatibility.  */

But I thought "wait a minute, didn't we just undefine it above and it
failed?"
Yes we did, and it turns out not all gcc calls have --std=c11 and due to
that it failed for those.



2. Add Altivec detection to impacted users of stdbool.h, which #undef and
>    redefine bool as _Bool on their own with a short comment about broken
>    toolchains.
>

I tested a few versions of this after my findings on #1 above.
A few extra loops to jump like to make drivers/net/cxgbe/cxgbe_compat.h
usage of bool happy.
I eventually came up with the following as a fix that seems to work:

--- a/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h
@@ -37,6 +37,19 @@
#include <string.h>
/*To include altivec.h, GCC version must  >= 4.8 */
#include <altivec.h>
+/*
+ * if built with std=c11 stdbool and vector bool will conflict.
+ * But if std is not c11 (which is true for some of our gcc calls) it will
+ * have __APPLE_ALTIVEC__ defined which will make it not define the types
+ * at all.
+ * Furthermore we need to be careful to only redefine as stdbool would have
+ * done if it was included - otherwise we might conflict with other
intended
+ * meanings of "bool".
+ */
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L &&
defined(_STDBOOL_H)
+#undef bool
+#define bool _Bool
+#endif

#ifdef __cplusplus
extern "C" {


In turn I have only checked this modification on ppc64 so far, but anyway I
still have the feeling we are only trying to poke at things with a stick
and someone with subject matter experience would just tell us.
Opinions on the change above as a "v2 RFC"?



> 3. Replace bool with _Bool to impacted users of stdbool.h. Basically what
>    you did below with "int" but slightly more correct since true/false can
>    still be used with _Bool. A comment explaining why is necessary after
> the
>    inclusion of stdbool.h.
>

Could be an option, but I think this just means others converting code will
stumble over it again and again.
Especially if one tests x86 only and then ppc64 breaks later (which IMHO is
what happened here).

Can you validate these suggestions? I don't have the right setup for that.
>

I only can grab a ppc box if one is free, but currently it seems I'm lucky
:-)


Thanks.
>
> > Description: Fix ppc64le build error between altivec and bool
> >
> > We really hope there will eventually be a better fix for this, but
> > currently
> > we have to unbreak building this code so until something better is
> > available
> > let's use this modification.
> >
> > Forwarded: yes
> > Forward-info: http://mails.dpdk.org/archives/dev/2018-August/109926.html
> > Author: Christian Ehrhardt <christian.ehrhardt at canonical.com>
> > Last-Update: 2018-08-22
> > --- a/drivers/net/mlx5/mlx5_nl.c
> > +++ b/drivers/net/mlx5/mlx5_nl.c
> > @@ -834,8 +834,8 @@ mlx5_nl_switch_info_cb(struct nlmsghdr *
> >                .switch_id = 0,
> >        };
> >        size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
> > -       bool port_name_set = false;
> > -       bool switch_id_set = false;
> > +       int port_name_set = 0;
> > +       int switch_id_set = 0;
> >
> >        if (nh->nlmsg_type != RTM_NEWLINK)
> >                goto error;
> > @@ -854,7 +854,7 @@ mlx5_nl_switch_info_cb(struct nlmsghdr *
> >                        if (errno ||
> >                            (size_t)(end - (char *)payload) !=
> > strlen(payload))
> >                                goto error;
> > -                       port_name_set = true;
> > +                       port_name_set = 1;
> >                        break;
> >                case IFLA_PHYS_SWITCH_ID:
> >                        info.switch_id = 0;
> > @@ -862,7 +862,7 @@ mlx5_nl_switch_info_cb(struct nlmsghdr *
> >                                info.switch_id <<= 8;
> >                                info.switch_id |= ((uint8_t *)payload)[i];
> >                        }
> > -                       switch_id_set = true;
> > +                       switch_id_set = 1;
> >                        break;
> >                }
> >                off += RTA_ALIGN(ra->rta_len);
> > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > @@ -1335,8 +1335,8 @@ mlx5_sysfs_switch_info(unsigned int ifin
> >        char ifname[IF_NAMESIZE];
> >        FILE *file;
> >        struct mlx5_switch_info data = { .master = 0, };
> > -       bool port_name_set = false;
> > -       bool port_switch_id_set = false;
> > +       int port_name_set = 0;
> > +       int port_switch_id_set = 0;
> >        char c;
> >
> >        if (!if_indextoname(ifindex, ifname)) {
> > --- a/drivers/net/mlx5/mlx5_nl_flow.c
> > +++ b/drivers/net/mlx5/mlx5_nl_flow.c
> > @@ -385,11 +385,11 @@ mlx5_nl_flow_transpose(void *buf,
> >        const struct rte_flow_action *action;
> >        unsigned int n;
> >        uint32_t act_index_cur;
> > -       bool in_port_id_set;
> > -       bool eth_type_set;
> > -       bool vlan_present;
> > -       bool vlan_eth_type_set;
> > -       bool ip_proto_set;
> > +       int in_port_id_set;
> > +       int eth_type_set;
> > +       int vlan_present;
> > +       int vlan_eth_type_set;
> > +       int ip_proto_set;
> >        struct nlattr *na_flower;
> >        struct nlattr *na_flower_act;
> >        struct nlattr *na_vlan_id;
> > @@ -404,11 +404,11 @@ init:
> >        action = actions;
> >        n = 0;
> >        act_index_cur = 0;
> > -       in_port_id_set = false;
> > -       eth_type_set = false;
> > -       vlan_present = false;
> > -       vlan_eth_type_set = false;
> > -       ip_proto_set = false;
> > +       in_port_id_set = 0;
> > +       eth_type_set = 0;
> > +       vlan_present = 0;
> > +       vlan_eth_type_set = 0;
> > +       ip_proto_set = 0;
> >        na_flower = NULL;
> >        na_flower_act = NULL;
> >        na_vlan_id = NULL;
> >
> >
> > On Tue, Aug 21, 2018 at 4:19 PM Christian Ehrhardt <
> > christian.ehrhardt at canonical.com> wrote:
> >
> > > Hi,
> > > Debian and Ubuntu face a build error with 18.08 on ppc64el.
> > > It looks like that:
> > >
> > > Full log:
> > >
> > >
> https://buildd.debian.org/status/fetch.php?pkg=dpdk&arch=ppc64el&ver=18.08-1&stamp=1534520196&raw=0
> > >
> > > /<<PKGBUILDDIR>>/drivers/net/mlx5/mlx5_nl.c: In function
> > > 'mlx5_nl_switch_info_cb':
> > > /<<PKGBUILDDIR>>/drivers/net/mlx5/mlx5_nl.c:837:23: error: incompatible
> > > types when initializing type '__vector __bool int' {aka '__vector(4)
> __bool
> > > int'} using type 'int'
> > >   bool port_name_set = false;
> > >                        ^~~~~
> > > /<<PKGBUILDDIR>>/drivers/net/mlx5/mlx5_nl.c:838:23: error: incompatible
> > > types when initializing type '__vector __bool int' {aka '__vector(4)
> __bool
> > > int'} using type 'int'
> > >   bool switch_id_set = false;
> > >                        ^~~~~
> > > /<<PKGBUILDDIR>>/drivers/net/mlx5/mlx5_nl.c:857:18: error: incompatible
> > > types when assigning to type '__vector __bool int' {aka '__vector(4)
> __bool
> > > int'} from type 'int'
> > >     port_name_set = true;
> > >                   ^
> > > /<<PKGBUILDDIR>>/drivers/net/mlx5/mlx5_nl.c:865:18: error: incompatible
> > > types when assigning to type '__vector __bool int' {aka '__vector(4)
> __bool
> > > int'} from type 'int'
> > >     switch_id_set = true;
> > >                   ^
> > > /<<PKGBUILDDIR>>/drivers/net/mlx5/mlx5_nl.c:870:16: error: used vector
> > > type where scalar is required
> > >   info.master = switch_id_set && !port_name_set;
> > >                 ^~~~~~~~~~~~~
> > > /<<PKGBUILDDIR>>/drivers/net/mlx5/mlx5_nl.c:870:33: error: wrong type
> > > argument to unary exclamation mark
> > >   info.master = switch_id_set && !port_name_set;
> > >                                  ^
> > > /<<PKGBUILDDIR>>/drivers/net/mlx5/mlx5_nl.c:871:21: error: used vector
> > > type where scalar is required
> > >   info.representor = switch_id_set && port_name_set;
> > >
> > >
> > > Now I checked and the reason seems to be some combination of altivec
> and
> > > MLX headers and the use of bool - probably stdbool vs altivec bool.
> > >
> > > If built with gcc -E I see it the bool variables become:
> > >    __attribute__((altivec(bool__))) unsigned port_name_set =
> > >
> > > I have found a strawmans approach to it, but I'm sure people with
> > > experience on the matter will come up with something better.
> > >
> > > My current change looks like that and would work:
> > > $ git diff
> > > diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
> > > index d61826aea..2cc8f49c5 100644
> > > --- a/drivers/net/mlx5/mlx5_nl.c
> > > +++ b/drivers/net/mlx5/mlx5_nl.c
> > > @@ -834,8 +834,8 @@ mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void
> > > *arg)
> > >                .switch_id = 0,
> > >        };
> > >        size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
> > > -       bool port_name_set = false;
> > > -       bool switch_id_set = false;
> > > +       int port_name_set = 0;
> > > +       int switch_id_set = 0;
> > >
> > >        if (nh->nlmsg_type != RTM_NEWLINK)
> > >                goto error;
> > > @@ -854,7 +854,7 @@ mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void
> > > *arg)
> > >                        if (errno ||
> > >                            (size_t)(end - (char *)payload) !=
> > > strlen(payload))
> > >                                goto error;
> > > -                       port_name_set = true;
> > > +                       port_name_set = 1;
> > >                        break;
> > >                case IFLA_PHYS_SWITCH_ID:
> > >                        info.switch_id = 0;
> > > @@ -862,7 +862,7 @@ mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void
> > > *arg)
> > >                                info.switch_id <<= 8;
> > >                                info.switch_id |= ((uint8_t
> *)payload)[i];
> > >                        }
> > > -                       switch_id_set = true;
> > > +                       switch_id_set = 1;
> > >                        break;
> > >                }
> > >                off += RTA_ALIGN(ra->rta_len);
>
> --
> Adrien Mazarguil
> 6WIND
>


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd


More information about the dev mailing list