[dpdk-dev] rte_lpm with larger nexthops or another method?

Matthew Hall mhall at mhcomputing.net
Mon Jun 22 19:53:02 CEST 2015


That's a lot better indeed! 16 million CIDR blocks would be a huge improvement.

Do you happen to know what is also possible for rte_lpm6?

Matthew.

On Mon, Jun 22, 2015 at 01:11:14PM +0300, Vladimir Medvedkin wrote:
> Hi Matthew,
> 
> I just recently thought about next_hop extension. For ipv4 we can do
> something like:
> struct rte_lpm_tbl24_entry {
>         /* Stores Next hop or group index (i.e. gindex)into tbl8. */
>         union {
>                 uint32_t  next_hop :24;
>                 uint32_t  tbl8_gindex :24;
>         }__attribute__((__packed__));
>         /* Using single uint8_t to store 3 values. */
>         uint32_t valid     :1; /**< Validation flag. */
>         uint32_t ext_entry :1; /**< External entry. */
>         uint32_t depth     :6; /**< Rule depth. */
> };
> so we have 24 bit for next_hop.
> 
> 2015-06-22 5:29 GMT+03:00 Matthew Hall <mhall at mhcomputing.net>:
> 
> > Hello,
> >
> > I have gone out on the internet for days looking at a bunch of different
> > radix tree implementations to see if I could figure a way to implement my
> > own tree, just to work around the really low 255 CIDR block limitation in
> > librte_lpm. Unfortunately every single one I could find falls into one of
> > these two annoying categories:
> >
> > 1) bloated with a lot of irrelevant kernel code I don't care about
> > (especially the Linux version but also the BSD one, which also makes a
> > weird assumption every address object stores its length in byte 0 of the
> > address struct). These are hard to convert into something that plays nice
> > with raw packet data.
> >
> > 2) very seemingly simple code, which breaks horribly if you try to add
> > IPv6 support (such as the radix tree from University of Michigan / LLVM
> > compiler benchmark suite, and the one from the old unmaintained mrt daemon,
> > which includes a bizarre custom reference-counted memory manager that is
> > very convoluted). These are easy to set up, but cause a lot of weird
> > segfaults which I am having a difficult time to try to debug.
> >
> > So it seems like I am going nowhere with this approach. Instead, I'd like
> > to know, what would I need to do to add this support to my local copy of
> > librte_lpm? Let's assume for the sake of this discussion, that I don't care
> > one iota about any performance cost, and I am happy if I need to prefetch
> > two cachelines instead of just one (which I recall from a past thread is
> > why librte_lpm has such a low nexthop limit to start with).
> >
> > Failing that, does anybody have a known good userspace version of any of
> > these sort of items:
> >
> > 1) Hash Based FIB (forwarding information base),
> > 2) Tree Based FIB,
> > 3) Patricia trie (which does not break horribly on IPv6 or make bad
> > assumptions about data format besides uint8_t* and length),
> > 4) Crit-Bit tree
> > 5) any other good way of taking IPv4 and IPv6 and finding the longest
> > prefix match against a table of pre-loaded CIDR blocks?
> >
> > I am really pulling out my hair trying to find a way to do something which
> > doesn't seem like it should have to be be this difficult. I must be missing
> > a more obvious way to handle this.
> >
> > Thanks,
> > Matthew


More information about the dev mailing list