[dpdk-dev] [RFC 0/8] mbuf: structure reorganization

Ananyev, Konstantin konstantin.ananyev at intel.com
Tue Feb 28 23:53:55 CET 2017



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, February 28, 2017 12:28 PM
> To: Ananyev, Konstantin <konstantin.ananyev at intel.com>
> Cc: Jan Blunck <jblunck at infradead.org>; Richardson, Bruce <bruce.richardson at intel.com>; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
> 
> On Tue, 28 Feb 2017 11:48:20 +0000, "Ananyev, Konstantin"
> <konstantin.ananyev at intel.com> wrote:
> > >
> > > On Tue, 28 Feb 2017 10:29:41 +0000, "Ananyev, Konstantin"
> > > <konstantin.ananyev at intel.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, 28 Feb 2017 09:05:07 +0000, "Ananyev, Konstantin"
> > > > > <konstantin.ananyev at intel.com> wrote:
> > > > > > Hi everyone,
> > > > > >
> > > > > > > >
> > > > > > > > In my opinion, if we have the room in the first cache
> > > > > > > > line, we should put it there. The only argument I see
> > > > > > > > against is "we may find something more important in the
> > > > > > > > future, and we won't have room for it in the first cache
> > > > > > > > line". I don't feel we should penalize today's use cases
> > > > > > > > for hypothetic future use cases.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >> 2. timestamp normalization point
> > > > > > > >>      inside PMD RX vs somewhere later as user needs it
> > > > > > > >> (extra function in dev_ops?).
> > > > > > > >
> > > > > > > > This point could be changed. My initial proposition tries
> > > > > > > > to provide a generic API for timestamp. Let me remind it
> > > > > > > > here:
> > > > > > > >
> > > > > > > > a- the timestamp is in nanosecond
> > > > > > > > b- the reference is always the same for a given path: if
> > > > > > > > the timestamp is set in a PMD, all the packets for this
> > > > > > > > PMD will have the same reference, but for 2 different
> > > > > > > > PMDs (or a sw lib), the reference would not be the same.
> > > > > > > >
> > > > > > > > We may remove a-, and just have:
> > > > > > > >  - the reference and the unit are always the same for a
> > > > > > > > given path: if the timestamp is set in a PMD, all the
> > > > > > > > packets for this PMD will have the same reference and
> > > > > > > > unit, but for 2 different PMDs (or a sw lib), they would
> > > > > > > > not be the same.
> > > > > > > >
> > > > > > > > In both cases, we would need a conversion code (maybe in a
> > > > > > > > library) if the application wants to work with timestamps
> > > > > > > > from several sources. The second solution removes the
> > > > > > > > normalization code in the PMD when not needed, it is
> > > > > > > > probably better.
> > > > > > >
> > > > > > > I agree.
> > > > > >
> > > > > > One question - does that mean that application would need to
> > > > > > keep a track from what PMD each particular packet came to do
> > > > > > the normalization? Konstantin
> > > > >
> > > > > I'd say yes. It does not look very difficult to do, since the
> > > > > mbuf contains the input port id.
> > > > >
> > > >
> > > > I understand that we can use mbuf->port here, but it means that
> > > > we'll introduce new implicit dependency between timestamp and
> > > > port values. From my point that introduces new implications:
> > > > 1. all PMDs that do set a timestamp would also have to set port
> > > > value too. Probably not a big deal as most of PMDs do set port
> > > > value anyway right now, but it means it would be hard to get
> > > > rid/change mbuf->port in future.
> > >
> > > Currently, all PMDs must set m->port.
> > > If in the future we remove m->port, the applications that use it
> > > will need to store the value in a mbuf metadata, or pass it as
> > > arguments through function calls.
> > >
> > >
> > > > 2. Applications would not allowed to change mbuf->port value
> > > > before normalization is done (from what I heard some apps do
> > > > update mbuf->port to store routing decisions). BTW, how the app
> > > > would keep track which mbufs were already normalized, and which
> > > > were not?
> > >
> > > I don't think it should be allowed to change m->port value.
> >
> > As far as I know it is allowed right now.
> > PMD RX routine sets mbuf->port, after that application is free to use
> > it in a way it likes.
> 
> The descriptor or m->port is "Input port". If the applications stores
> something else than the input port, it is its responsibility if it
> breaks something else. Like changing any other field to put something
> that does not match the description.
> 
> 
> > What we are introducing here is basically a new dependency between 2
> > mbuf fields and new restriction.
> 
> On the other hand, there is no strong dependency: the API to do the
> normalization can take the port as a parameter.

Ok, that would be much better - the dependency is still there,
but at least we don't force it.

> 
> 
> >
> > Another thing that doesn't look very convenient to me here -
> > We can have 2 different values of timestamp (both normalized and not)
> > and there is no clear way for the application to know which one is in
> > use right now. So each app writer would have to come-up with his own
> > solution.
> 
> It depends:
> - the solution you describe is to have the application storing the
>   normalized value in its private metadata.
> - another solution would be to store the normalized value in
>   m->timestamp. In this case, we would need a flag to tell if the
>   timestamp value is normalized.

My first thought also was about second flag to specify was timestamp
already normalized or not.
Though I still in doubt - is it all really worth it: extra ol_flag, new function in eth_dev API.
My feeling that we trying to overcomplicate things.

> 
> The problem pointed out by Jan is that doing the timestamp
> normalization may take some CPU cycles, even if a small part of packets
> requires it.

I understand that point, but from what I've seen with real example:
http://dpdk.org/ml/archives/dev/2016-October/048810.html
the amount of calculations at RX is pretty small.
I don't think it would affect performance in a noticeable way
(though I don't have any numbers here to prove it).
>From other side, if user doesn't want a timestamp he can always disable
that feature anad save cycles, right? 

BTW, you and Jan both mention that not every packet would need a timestamp.
Instead we need sort of a timestamp for the group of packets?
Is that really the only foreseen usage model?
If so, then why not to have a special function that would extract 'latest' timestamp
from the dev?
Or even have tx_burst_extra() that would return a latest timestamp (extra parameter or so).
Then there is no need to put timestamp into mbuf at all. 

> 
> 
> >
> > > Applications that
> > > are doing this are responsible of what they change.
> > >
> > >
> > > > 3. In theory with eth_dev_detach() - mbuf->port value might be
> > > > not valid at the point when application would decide to do
> > > > normalization.
> > > >
> > > > So to me all that approach with delayed normalization seems
> > > > unnecessary overcomplicated. Original one suggested by Olivier,
> > > > when normalization is done in PMD at RX look much cleaner and
> > > > more manageable.
> > >
> > > Detaching a device requires a synchronization between control and
> > > data plane, and not only for this use case.
> >
> > Of course it does.
> > But right now it is possible to do:
> >
> > eth_rx_burst(port=0, ..., &mbuf, 1);
> > eth_dev_detach(port=0, ...);
> > ...
> > /*process previously received mbuf */
> >
> > With what you are proposing it would be not always possible any more.
> 
> With your example, it does not work even without the timestamp feature,
> since the mbuf input port would reference an invalid port.
> This port  is usually used in the application to do a lookup for an port structure,
> so it is expected that the entry is valid. It would be even worse if you
> do a detach + attach.

I am not talking about the mbuf->port value usage.
Right now user can access/interpret  all metadata fields set by PMD RX routines
(vlan, rss hash, ol_flags, ptype, etc.) without need to accessing the device data or
calling device functions.
With that change it wouldn't be the case anymore. 

> 
> So, I think it is already the responsibility of the application to do
> the sync (flush retrieved packets before detaching a port).

The packets are not in RX or TX queue of detaching device any more.
I received a packet, after that I expect to have all its data and metadata inside mbuf.
So I can store mbufs somewhere and process them much later.
Or might be I would like to pass it to the secondary process for logging/analyzing, etc.

> 
> >
> > >In the first solution, the normalization is
> > > partial: unit is nanosecond, but the time reference is different.
> >
> > Not sure I get you here...
> 
> In the first solution I described, each PMD had to convert its unit
> into nanosecond. This is easy because we assume the PMD knows the
> value of its clock. But to get a fully normalized value, it also has to
> use the same time reference, so we would also need to manage an offset
> (we need a new API to give this value to the PMD).

Yes, I suppose we do need an start timestamp and sort of factor() to convert
HW value, something like:

mbuf->timestamp = rxq->start_timestamp  + factor(hw_timestamp);

Right?
Why passing start_timestamp at the configure() phase will be a problem?

> 
> I have another fear related to hardware clocks: if clocks are not
> synchronized between PMDs, the simple operation "t * ratio - offset"
> won't work. That's why I think we could delegate this job in a specific
> library that would manage this.

But then that library would need to account all PMDs inside the system,
and be aware about each HW clock skew, etc.
Again, doesn't' sound like an simple task to me. 

> 
> Having a non-normalized timestamp as of today would allow applications
> to take advantage of it for many use cases, even without the
> normalization library that could come later (and that may probably
> be more complex than expected).
> 
> 
> Olivier


More information about the dev mailing list