[dpdk-dev] mbuf changes

Alejandro Lucero alejandro.lucero at netronome.com
Wed Nov 9 12:42:30 CET 2016


On Wed, Oct 26, 2016 at 10:28 AM, Alejandro Lucero <
alejandro.lucero at netronome.com> wrote:

>
>
> On Tue, Oct 25, 2016 at 2:05 PM, Bruce Richardson <
> bruce.richardson at intel.com> wrote:
>
>> On Tue, Oct 25, 2016 at 05:24:28PM +0530, Shreyansh Jain wrote:
>> > On Monday 24 October 2016 09:55 PM, Bruce Richardson wrote:
>> > > On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith wrote:
>> > > >
>> > > > > On Oct 24, 2016, at 10:49 AM, Morten Brørup <
>> mb at smartsharesystems.com> wrote:
>> > > > >
>> > > > > First of all: Thanks for a great DPDK Userspace 2016!
>> > > > >
>> > > > >
>> > > > >
>> > > > > Continuing the Userspace discussion about Olivier Matz’s proposed
>> mbuf changes...
>> > >
>> > > Thanks for keeping the discussion going!
>> > > > >
>> > > > >
>> > > > >
>> > > > > 1.
>> > > > >
>> > > > > Stephen Hemminger had a noteworthy general comment about keeping
>> metadata for the NIC in the appropriate section of the mbuf: Metadata
>> generated by the NIC’s RX handler belongs in the first cache line, and
>> metadata required by the NIC’s TX handler belongs in the second cache line.
>> This also means that touching the second cache line on ingress should be
>> avoided if possible; and Bruce Richardson mentioned that for this reason
>> m->next was zeroed on free().
>> > > > >
>> > > Thinking about it, I suspect there are more fields we can reset on
>> free
>> > > to save time on alloc. Refcnt, as discussed below is one of them, but
>> so
>> > > too could be the nb_segs field and possibly others.
>> > >
>> > > > >
>> > > > >
>> > > > > 2.
>> > > > >
>> > > > > There seemed to be consensus that the size of m->refcnt should
>> match the size of m->port because a packet could be duplicated on all
>> physical ports for L3 multicast and L2 flooding.
>> > > > >
>> > > > > Furthermore, although a single physical machine (i.e. a single
>> server) with 255 physical ports probably doesn’t exist, it might contain
>> more than 255 virtual machines with a virtual port each, so it makes sense
>> extending these mbuf fields from 8 to 16 bits.
>> > > >
>> > > > I thought we also talked about removing the m->port from the mbuf
>> as it is not really needed.
>> > > >
>> > > Yes, this was mentioned, and also the option of moving the port value
>> to
>> > > the second cacheline, but it appears that NXP are using the port value
>> > > in their NIC drivers for passing in metadata, so we'd need their
>> > > agreement on any move (or removal).
>> >
>> > I am not sure where NXP's NIC came into picture on this, but now that
>> it is
>> > highlighted, this field is required for libevent implementation [1].
>> >
>> > A scheduler sending an event, which can be a packet, would only have
>> > information of a flow_id. From this matching it back to a port, without
>> > mbuf->port, would be very difficult (costly). There may be way around
>> this
>> > but at least in current proposal I think port would be important to
>> have -
>> > even if in second cache line.
>> >
>> > But, off the top of my head, as of now it is not being used for any
>> specific
>> > purpose in NXP's PMD implementation.
>> >
>> > Even the SoC patches don't necessarily rely on it except using it
>> because it
>> > is available.
>> >
>> > @Bruce: where did you get the NXP context here from?
>> >
>> Oh, I'm just mis-remembering. :-( It was someone else who was looking for
>> this - Netronome, perhaps?
>>
>> CC'ing Alejandro in the hope I'm remembering correctly second time
>> round!
>>
>>
> Yes. Thanks Bruce!
>
> So Netronome uses the port field and, as I commented on the user meeting,
> we are happy with the field going from 8 to 16 bits.
>
> In our case, this is something some clients have demanded, and if I'm not
> wrong (I'll double check this asap), the port value is for knowing where
> the packet is coming from. Think about a switch in the NIC, with ports
> linked to VFs/VMs, and one or more physical ports. That port value is not
> related to DPDK ports but to the switch ports. Code in the host (DPDK or
> not) can receive packets from the wire or from VFs through the NIC. This is
> also true for packets received by VMs, but I guess the port value is just
> interested for host code.
>
>
>

I consulted this functionality internally and it seems we do not need this
anymore. In fact, I will remove the metadata port handling soon from our
PMD.



> /Bruce
>>
>
>


More information about the dev mailing list