[dpdk-users] Performance of rte_eth_stats_get
Alireza Sanaee
sarsanaee at gmail.com
Wed Jul 14 12:25:41 CEST 2021
On 19/05/2021 17:06, Stephen Hemminger wrote:
> On Wed, 19 May 2021 15:14:38 +0000
> "Van Haaren, Harry" <harry.van.haaren at intel.com> wrote:
>
>>> -----Original Message-----
>>> From: users <users-bounces at dpdk.org> On Behalf Of Filip Janiszewski
>>> Sent: Wednesday, May 19, 2021 2:10 PM
>>> To: users at dpdk.org
>>> Subject: [dpdk-users] Performance of rte_eth_stats_get
>>>
>>> Hi,
>>>
>>> Is it safe to call rte_eth_stats_get while capturing from the port?
>>>
>>> I'm mostly concerned about performance, if rte_eth_stats_get will in any
>>> way impact the port performance, in the application I plan to call the
>>> function from a thread that is not directly involved in the capture,
>>> there's another worker responsible for rx bursting, but I wonder if the
>>> NIC might get upset if I call it too frequently (say 10 times per
>>> second) and potentially cause some performance issues.
>>>
>>> The question is really Nic agnostic, but if the Nic vendor is actually
>>> relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5.
>>
>> To understand what really goes on when getting stats, it might help to list the
>> steps involved in getting statistics from the NIC HW.
>>
>> 1) CPU sends an MMIO read (Memory Mapped I/O, aka, sometimes referred
>> to as a "pci read") to the NIC.
>> 2) The PCI bus has to handle extra TLPs (pci transactions) to satisfy read
>> 3) NIC has to send a reply based on accessing its internal counters
>> 4) CPU gets a result from the PCI read.
>>
>> Notice how elegantly this whole process is abstracted from SW? In code, reading
>> a stat value is just dereferencing a pointer that is mapped to the NIC HW address.
>> In practice from a CPU performance point of view, doing an MMIO-read is one of
>> the slowest things you can do. You say the stats-reads are occurring from a thread
>> that is not handling rx/datapath, so perhaps the CPU cycle cost itself isn't a concern.
>>
>> Do note however, that when reading a full set of extended stats from the NIC, there
>> could be many 10's to 100's of MMIO reads (depending on the statistics requested,
>> and how the PMD itself is implemented to handle stats updates).
>>
>> The PCI bus does become more busy with reads to the NIC HW when doing lots of
>> statistic updates, so there is some more contention/activity to be expected there.
>> The PCM tool can be very useful to see MMIO traffic, you could measure how many
>> extra PCI transactions are occurring due to reading stats every X ms?
>> https://github.com/opcm/pcm
>>
>> I can recommend measuring pkt latency/jitter as a histogram, as then outliers in performance
>> can be identified. If you specifically want to identify if these are due stats reads, compare
>> with a "no stats reads" latency/jitter histogram, and graphically see the impact.
>> In the end if it doesn't affect packet latency/jitter, then it has no impact right?
>>
>> Ultimately, I can't give a generic answer - best steps are to measure carefully and find out!
>>
>>> Thanks
>>
>> Hope the above helps and doesn't add confusion :) Regards, -Harry
>
> Many drivers require transactions with the firmware via mailbox.
> And that transaction needs a spin wait for the shared area.
>
Thank you for explaining the steps quite nicely. I also noticed this
problem too. Calling `rte_eth_stats_get` in the PMDport per batch almost
halves the throughput in a 10G setup IIRC, the cost is prohibitively
HIGH. This, however, doesn't show up when DPDK connects a vhost-pmdport,
since all of port statistics are probably somewhere in the shared
memory.
More information about the users
mailing list