[dpdk-dev] BUG: eBPF missing BPF_ABS
mb at smartsharesystems.com
Thu Feb 6 09:54:22 CET 2020
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev,
> Sent: Wednesday, February 5, 2020 10:16 PM
> > As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
> > not usable for packet filters. I have ported the classic BPF to eBPF
> > and the generated code is not usable by DPDK.
> > The problem is that DPDK eBPF does not implement all the opcodes.
> > BPF_ABS is not implemented and must be. It is in the Linux kernel.
> Yep, it doesn't.
> This is not a generic eBPF instruction, but sort of implicit function
> to access skb data. At initial stage of librte_bpf development,
> I didn't think much about cBPF conversion, and to simplify things
> decided to put all special skb features aside.
> But sure, if that will enable DPDK cBPF support, let's add it in.
> Please fill a Bugzilla ticket to track that issue, and I'll try to
> find some time within 20.05 to look at it.
> Unless of course, you or someone else would like to volunteer for it.
> Though at first step, we probably need to decide what should be
> our requirements for it in terms of DPDK.
> From https://www.kernel.org/doc/Documentation/networking/filter.txt:
> "eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
> (BPF_IND | <size> | BPF_LD) which are used to access packet data.
> They had to be carried over from classic to have strong performance of
> socket filters running in eBPF interpreter. These instructions can only
> be used when interpreter context is a pointer to 'struct sk_buff' and
> have seven implicit operands. Register R6 is an implicit input that
> contain pointer to sk_buff. Register R0 is an implicit output which
> the data fetched from the packet. Registers R1-R5 are scratch registers
> and must not be used to store the data across BPF_ABS | BPF_LD or
> BPF_IND | BPF_LD instructions.
> These instructions have implicit program exit condition as well. When
> eBPF program is trying to access the data beyond the packet boundary,
> the interpreter will abort the execution of the program. JIT compilers
> therefore must preserve this property. src_reg and imm32 fields are
> explicit inputs to these instructions.
> For example:
> BPF_IND | BPF_W | BPF_LD means:
> R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
> and R1 - R5 were scratched."
> For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar
> to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like:
> 1) uint32_t tmp;
> R0 = &tmp;
> R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6, src_reg +
> sizeof(uint32_t), R0);
> if (R0 == NULL) return FAILED;
> R0 = ntohl(*(uint32_t *)R0);
> But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)?
> Should it be just:
> 2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32));
> 3) not allow LD_ABS/LD_IND in this mode at all.
I think that 1) is the correct choice for the "cBPF filter" use case.
In that context I consider both 2) and 3) irrelevant because the RTE_BPF_ARG_PTR_MBUF type should be used for cBPF filtering. I have argued for this before.
Others have argued for using the RTE_BPF_ARG_PTR type instead. Let's consider using the RTE_BPF_ARG_PTR for a moment... Is there an implicit understanding that the data points to packet data? Then a range check in 2) might be relevant.
However, if the RTE_BPF_ARG_PTR_MBUF type is supported and used for cBPF->eBPF conversion, we would not need to support LD_ABS/LD_IND for the RTE_BPF_ARG_PTR type. So between 2) and 3), I support 3).
> Second question is implementation.
> I can see two main options here:
> a) if we plan to have our own cBPF->eBPF converter and support only it,
> we can add these extra instructions generation into converter itself.
> I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
> of generic eBPF instructions.
> b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
> (a) probably a simpler way (eBPF interpreter/jit/verifier would remain
> but seems way too limited. So I think (b) is a better choice, even more
> work implied
> (interpreter seems more or less straightforward, jit would probably
> need some effort).
This is going to be used in the fast path, probably on all packets on an interface. So clearly b).
> Any thoughts/opinions?
One more piece of information: Linux cBPF supports Auxiliary data (VLAN ID, Interface Index, etc.), i.e. metadata that are not part of the packet data, but can be found in the sk_buff/mbuf. Going for 1) and b) might make it easier adding support for these later.
Med venlig hilsen / kind regards
- Morten Brørup
More information about the dev