[dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds

Tom Barbette barbette at kth.se
Tue May 26 09:44:55 CEST 2020


Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested it in the
>>>>> device/port init routine and the result looks reliable. Since this approach
>>>>> looks very simple, compared to the time sync mechanism, I'm trying to
>>>>> integrate.
>>>>>
>>>>> The conversion should occur in the primary process (testpmd) I suppose.
>>>>> 1) The needed clock info derives from ethernet device. Is it possible to
>>>>>     access that struct from a rx callback?
>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>>>>     (workaround: copy `mbuf->udata64` in forwarded packets.)
>>>>> 3) any other idea?
>>>> The timestamp is carried in mbuf.
>>>> Then the conversion must be done by the ethdev caller (application or
>>>> any other upper layer).
>>> What if the converter function needs a clock_info?
>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
>>> I'm affraid this info may change by the time the converter is called
>>> by upper layer.
>> Indeed, the clock in the device is not an atomic one :)
>> We need to adjust the time conversion continuously.
>> I am not an expert of time synchronization, so I add more people Cc
>> who could help for having a precise timestamp.
> Thanks Thomas.
> Not sure this is a synchronization issue. We have dedicated processes
> (linuxptp) to keep both NIC and sys clocks in sync with an external clock.
> It is "just" a matter of unit conversion.
>
> If it has to be performed in dpdk-pdump, I would need some help to
> retrieve mlx5_clock_info from inside a secondary process. Looking at
> mlx5_read_clock(), this info is extracted from ibv_context which looks
> reachable in a primary process only (segfault, if I try in pdump).


I don't know about the integrated ts-to-ns, but we implemented a 
translation mechanism that mimics what NTP does in Linux to translate a 
given clock (TSC at first) to a wall time. You'll find more info at 
https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter 
3.4.1.  This is an often forgotten matter, as we saw in real switches 
that the time spent in time-related VDSO is enormous.

We wanted to do a very precise capture too, se we made that clock able 
to synchronize itself with the ConnectX 5 internal clock as a base 
instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure 
nanosecond is impossible, but close enough. It is for that purpose that 
I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to 
read the current clock (like rdtsc() instruction for TSC) to compute the 
frequency.

The "converter" code is there : 
https://github.com/tbarbette/fastclick/blob/master/elements/userlevel/tscclock.cc, 
the source is configurable (TSC, rte_eth_read_clock, GPS meinberg clock, 
...), the DPDK one is there : 
https://github.com/tbarbette/fastclick/blob/2ab021283b82d0b980551480c505ec8dced98e0a/elements/userlevel/dpdkdevclock.cc#L27 


One important thing is that the conversion factor must be changed from 
time to time to fix the drifiting. That is the reason why we can't just 
push a bunch of code to DPDK (and it's probably not as simple as using 
the ts-to-ns in mlx5) because you must have a timer, and use a RCU to 
update "atomically" a > 64bits struct. Though most of that is available 
now in DPDK but there will always be some setup (rcu barrier, timer 
init, ...).

In the end it's not hard science... It worked like a charm to do a 
campus trace capture on 100G hardware.



More information about the dev mailing list