[dpdk-dev] BUG: eBPF missing BPF_ABS

Stephen Hemminger stephen at networkplumber.org
Thu Feb 6 10:11:27 CET 2020


I agree fixing offsets in cbpf to ebpf converter and passing mbuf is
easier. There is still the pathological case of multi segment mbuf. But
Linux XDP doesn't handle it either.

Let me put early version of filter2rteebf on GitHub

On Thu, Feb 6, 2020, 8:54 AM Morten Brørup <mb at smartsharesystems.com> wrote:

> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev,
> > Konstantin
> > Sent: Wednesday, February 5, 2020 10:16 PM
> >
> > >
> > > As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
> > > not usable for packet filters. I have ported the classic BPF to eBPF
> > code
> > > and the generated code is not usable by DPDK.
> > >
> > > The problem is that DPDK eBPF does not implement all the opcodes.
> > > BPF_ABS is not implemented and must be. It is in the Linux kernel.
> >
> > Yep, it doesn't.
> > This is not a generic eBPF instruction, but sort of implicit function
> > call
> > to access skb data. At initial stage of librte_bpf development,
> > I didn't think much about cBPF conversion, and to simplify things
> > decided to put all special skb features aside.
> > But sure, if that will enable DPDK cBPF support, let's add it in.
> > Please fill a Bugzilla ticket to track that issue, and I'll try to
> > find some time within 20.05 to look at it.
> > Unless of course, you or someone else would like to volunteer for it.
> >
> > Though at first step, we probably need to decide what should be
> > our requirements for it in terms of DPDK.
> > From https://www.kernel.org/doc/Documentation/networking/filter.txt:
> > "eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
> > (BPF_IND | <size> | BPF_LD) which are used to access packet data.
> > They had to be carried over from classic to have strong performance of
> > socket filters running in eBPF interpreter. These instructions can only
> > be used when interpreter context is a pointer to 'struct sk_buff' and
> > have seven implicit operands. Register R6 is an implicit input that
> > must
> > contain pointer to sk_buff. Register R0 is an implicit output which
> > contains
> > the data fetched from the packet. Registers R1-R5 are scratch registers
> > and must not be used to store the data across BPF_ABS | BPF_LD or
> > BPF_IND | BPF_LD instructions.
> > These instructions have implicit program exit condition as well. When
> > eBPF program is trying to access the data beyond the packet boundary,
> > the interpreter will abort the execution of the program. JIT compilers
> > therefore must preserve this property. src_reg and imm32 fields are
> > explicit inputs to these instructions.
> > For example:
> > BPF_IND | BPF_W | BPF_LD means:
> > R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
> > and R1 - R5 were scratched."
> >
> > For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar
> > to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like:
> >
> > 1) uint32_t tmp;
> >     R0 = &tmp;
> >     R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6,  src_reg +
> > imm32,
> >       sizeof(uint32_t), R0);
> >      if (R0 == NULL) return FAILED;
> >      R0 = ntohl(*(uint32_t *)R0);
> >
> > But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)?
> > Should it be just:
> > 2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32));
> > 3) not allow LD_ABS/LD_IND in this mode at all.
> >
>
> I think that 1) is the correct choice for the "cBPF filter" use case.
>
> In that context I consider both 2) and 3) irrelevant because the
> RTE_BPF_ARG_PTR_MBUF type should be used for cBPF filtering. I have argued
> for this before.
>
> Others have argued for using the RTE_BPF_ARG_PTR type instead. Let's
> consider using the RTE_BPF_ARG_PTR for a moment... Is there an implicit
> understanding that the data points to packet data? Then a range check in 2)
> might be relevant.
>
> However, if the RTE_BPF_ARG_PTR_MBUF type is supported and used for
> cBPF->eBPF conversion, we would not need to support LD_ABS/LD_IND for the
> RTE_BPF_ARG_PTR type. So between 2) and 3), I support 3).
>
> > Second question is implementation.
> > I can see two main options here:
> > a) if we plan to have our own cBPF->eBPF converter and support only it,
> > we can add these extra instructions generation into converter itself.
> > I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
> > of generic eBPF instructions.
> > b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
> >
> > (a) probably a simpler way (eBPF interpreter/jit/verifier would remain
> > unchanged),
> > but seems way too limited. So I think (b) is a better choice, even more
> > work implied
> > (interpreter seems more or less straightforward, jit would probably
> > need some effort).
> >
>
> This is going to be used in the fast path, probably on all packets on an
> interface. So clearly b).
>
> > Any thoughts/opinions?
> > Konstantin
>
> One more piece of information: Linux cBPF supports Auxiliary data (VLAN
> ID, Interface Index, etc.), i.e. metadata that are not part of the packet
> data, but can be found in the sk_buff/mbuf. Going for 1) and b) might make
> it easier adding support for these later.
>
>
> Med venlig hilsen / kind regards
> - Morten Brørup
>
>


More information about the dev mailing list