[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet

Oleg Kuporosov olegk at mellanox.com
Wed Oct 19 19:08:43 CEST 2016


Hello Oliver

Great thanks for your review and con

> > - include uint64_t timestamp field into rte_mbuf with minimal impact to
> >   throughput/latency. Keep it just simple uint64_t in ns (more than 580
> >   years) would be enough for immediate needs while using full
> >   struct timespec with twice bigger size would have much stronger
> >   performance impact as missed cacheline0.
> >
> > - it is possible as there is 6-bytes gap in 1st cacheline (fast path)
> >   and moving uint16_t vlan_tci_outer field to 2nd cacheline.
> >
> > - such move will only impact for pretty rare usable VLAN RX stripping
> >   mode for outer TCI (it used only for one NIC i40e from the whole set and
> >   allows to keep minimal performance impact for RX/TX timestamps.
> 
> This argument is difficult to accept. One can say you are adding a field for a
> pretty rare case used by only one NIC :)

It  may be looks so, but in fact not only for one NIC. Absence of timestamp there 
Required from developers implement its support in out of DPDK data path with
Local DPDK patching which also may lead some penalty in accuracy.
Good example here is open source implementation of tracing library - 
https://github.com/wanduow/libtrace/tree/libtrace4

These folks patched DPDK to have timestamp for every packet (Intel DPDK Patches folder)
And used it in out of band to store and later processing (lib/format_dpdk.c).
That was actually the starting point of my investigations.
Such approach is working for 1GBb case but not for 10-100 cases.

> 
> Honestly, I'm not able to judge whether timestamp is more important than
> vlan_tci_outer. As room is tight in the first cache line, your patch submission
> is the occasion to raise the question: how to decide what should be in the
> first part of the mbuf? There are also some other candidates for moving: m-
> >seqn is only used in librte_reorder and it is not set in the RX part of a driver.

Agree, it is difficult to decide, my thoughts were:
- there is hole (6 bytes) which wasn't marked as reserved for any planned extensions;
-vlan_tci_outer is being used by only one NIC (i40e) so far, 2nd NIC (fm10k) is using it
With comment:
		 * mbuf->vlan_tci_outer is an idle field in fm10k driver,
		 * so it can be selected to store sglort value.
To store some another value under some specific "if".

Also for i40e it is under #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC which per doc should be
Enabled for high throughput of small packets. So in default case (disabled) it anyway has
some performance penalty with using 32 bytes descriptor and moving it to 2nd CL would
not hit big additional penalty. Unfortunately I have no such NIC to measure.
Is there any data how often double tagging in being used  in  DPDK applications?

Another my thought was to have at the end of 1st CL enum which may hold
Reserved fields per specific use cases and data widths (uint8, 2xuint4, 4xuint2, 8xbytes).
 
> 
> About the timestamp, it would be valuable to have other opinions, not only
> about the placement of the field in the structure, but also to check that this
> API is also usable for other NICs.

Sure, but I didn't change timesync/timestamping API itself.

> Have you measured the impact of having the timestamp in the second part
> of the mbuf?

Yes, the worst case with prefetching of 2nd CL it is 5.7-5.9 % penalty for combined RX+TX
And expectedly much worse without prefetching.
In the best case it is 0.3..0.5 % for RX only. It can be explained by much harder cache
trashing when TX is "on".
 
> Changing the mbuf structure should happen as rarely as possible, and we
> have to make sure we take the correct decisions. I think we will discuss this at
> dpdk userland 2016.

Oh, yes, please discuss, I would not be able to join. :(

> Apart from that, I wonder if an ol_flag should be added to tell that the
> timestamp field is valid in the mbuf.

Oliver, there is PKT_RX_IEEE1588_TMST flag already.

Best regards,
Oleg.


More information about the dev mailing list