[dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds

N. Benes nbenes at eso.org
Fri May 29 22:46:00 CEST 2020


Hi everyone,

Tom Barbette:
> 
> Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested
>>>>>> it in the
>>>>>> device/port init routine and the result looks reliable. Since this
>>>>>> approach
>>>>>> looks very simple, compared to the time sync mechanism, I'm trying to
>>>>>> integrate.
>>>>>>
>>>>>> The conversion should occur in the primary process (testpmd) I
>>>>>> suppose.
>>>>>> 1) The needed clock info derives from ethernet device. Is it
>>>>>> possible to
>>>>>>     access that struct from a rx callback?
>>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>>>>>     (workaround: copy `mbuf->udata64` in forwarded packets.)
>>>>>> 3) any other idea?
>>>>> The timestamp is carried in mbuf.
>>>>> Then the conversion must be done by the ethdev caller (application or
>>>>> any other upper layer).
>>>> What if the converter function needs a clock_info?
>>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
>>>>
>>>> I'm affraid this info may change by the time the converter is called
>>>> by upper layer.
>>> Indeed, the clock in the device is not an atomic one :)
>>> We need to adjust the time conversion continuously.
>>> I am not an expert of time synchronization, so I add more people Cc
>>> who could help for having a precise timestamp.
>> Thanks Thomas.
>> Not sure this is a synchronization issue. We have dedicated processes
>> (linuxptp) to keep both NIC and sys clocks in sync with an external
>> clock.
>> It is "just" a matter of unit conversion.
>>
>> If it has to be performed in dpdk-pdump, I would need some help to
>> retrieve mlx5_clock_info from inside a secondary process. Looking at
>> mlx5_read_clock(), this info is extracted from ibv_context which looks
>> reachable in a primary process only (segfault, if I try in pdump).

The normal phc2sys can not only synchronise NIC -> system but also sys
-> NIC and (I believe it does but have not tried) NIC1 -> NIC2.
If I understand your proposal correctly, you want to use a free running
NIC counter and calibrate out the drift afterwards.
It may be easier to adapt phc2sys to use a NIC through DPDK and sync the
NIC's timewheel/VCO in a proven/reliable manner (e.g. low pass filtering
excursions). Then you could directly use the NIC counter value.

> I don't know about the integrated ts-to-ns, but we implemented a
> translation mechanism that mimics what NTP does in Linux to translate a
> given clock (TSC at first) to a wall time. You'll find more info at
> https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter
> 3.4.1.  This is an often forgotten matter, as we saw in real switches
> that the time spent in time-related VDSO is enormous.

Do you have measurements of vDSO clock_gettime and how much is
"enormous" to you?
To my knowledge, clock_gettime via vDSO on Linux only takes a few
nanoseconds in the average case. However, it can go up to ~10 or even
~50 microseconds every few (~10) seconds, depending on the number of
CPUs (for example single vs. dual socket, though my hardware for this
test is quite old, Dell R210-II, R610). Presumably this is when the
kernel locks the struct in VVAR to update the TSC drift compensation
parameters.

Linux clock_gettime implementation is here (different versions):
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/vdso/vclock_gettime.c?h=linux-3.10.y#n193
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/entry/vdso/vclock_gettime.c?h=linux-4.19.y#n241
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/lib/vdso/gettimeofday.c#n98

I use busy waiting on clock_gettime in a packet generator application
(so far 10 GbE only) to pace jumbo frames according to a spec
(simulating the traffic pattern of a to-be-developed hardware with
FPGA), and COTS sniffer hardware with absolute timestamping to verify my
generator's performance. I can observe the above 10-50 us artefacts and
sufficiently good/low (for my needs) average execution time of
clock_gettime. The only sad thing is that TAI clock does not go through
vDSO and therefore I cannot use it.

> We wanted to do a very precise capture too, se we made that clock able
> to synchronize itself with the ConnectX 5 internal clock as a base
> instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure
> nanosecond is impossible, but close enough. It is for that purpose that
> I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to
> read the current clock (like rdtsc() instruction for TSC) to compute the
> frequency.

Doesn't this mean that you need to wait for the PCIe op from the NIC?
Is this really faster than a rdtsc, memory/cache read, integer
multiplication and shift?

Cheers,
nicolas



More information about the dev mailing list