[PATCH 1/1] doc: add mlx5 xstats send scheduling counters description

Stephen Hemminger stephen at networkplumber.org
Mon Oct 28 16:57:08 CET 2024


On Mon, 28 Oct 2024 16:27:41 +0200
Viacheslav Ovsiienko <viacheslavo at nvidia.com> wrote:

> The mlx5 provides the scheduling send on time capability.
> The check the operating status of this feature the xstats
> counters are provided. This patch adds the counter descriptions
> and provides some meaningful information how to interpret
> the counter values in runtime.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo at nvidia.com>
> ---
>  doc/guides/nics/mlx5.rst | 48 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
> index f82e2d75de..8d1a1311d4 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -2655,3 +2655,51 @@ Destroy GENEVE TLV parser for specific port::
>  
>  This command doesn't destroy the global list,
>  For releasing options, ``flush`` command should be used.
> +
> +
> +Extended statistics counters
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Send scheduling related xstats counters
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The mlx5 PMD provides the set of tx_pp feature related counters to provide debug and diagnostics
> +on send packet scheduling. These counters are applicable only if port was probed with ``tx_pp``
> +devarg and reflect the status of PMD scheduling infrastructure based on Clock and Rearm Queues.
> +This infrastructure provedies the Send Scheduling capability on CX6DX NICs as temporary workaround
> +and should not be engaged on the newer hardware.
> +
> +- ``tx_pp_missed_interrupt_errors`` - the Rearm Queue interrupt was not serviced in time. EAL handles
> +  interrupts in dedicated thread and, possible, there were another time-consuming actions were taken.
> +
> +- ``tx_pp_rearm_queue_errors`` - hardware errors occurred on Rearm Queue, usually it is caused by not
> +  servicing interrupts in time
> +
> +- ``tx_pp_clock_queue_errors`` - hardware errors occurred on Clock Queue, usually it indicates some
> +  configuration or internal NIC hardware or firmware issues
> +
> +- ``tx_pp_timestamp_past_errors`` - application tried to send packet(s) with specifying timestamp in the past.
> +  This counter is useful to check and debug the application code, it does not indicate PMD malfunction.
> +
> +- ``tx_pp_timestamp_future_errors`` - application tried to send packet(s) with specifying timestamp
> +  in the too distant future, beyond the hardware capabilities to schedule the sending
> +  This counter is useful to check and debug the application code, it does not indicate PMD malfunction.
> +
> +- ``tx_pp_jitter`` - this counter exposes the internal NIC realtime clock jitter estimation between two
> +  neighbour Clock Queue completions in nanoseconds. Significant jitter might alert about clock
> +  synchronization issues (say, some system PTP agent might adjust NIC clock in inappropriate way)
> +
> +- ``tx_pp_wander`` - the counter exposes the longterm internal NUC realtime clock stability - tx_pp_wander
> +  for 2^24 completions, in nanoseconds. Significant wander might indicate clock synchronization issues.
> +
> +- ``tx_pp_sync_lost`` - the general operating indicator, the non-zero value says the driver lost
> +  the Clock Queue synchronization and scheduling does not operate correctly. The port must be restarted
> +  to restore the correct scheduling functioning.
> +
> +The following counters are extremely useful for application code check and debug, these ones do not
> +indicate driver or hardware mulfunctions, and are also applicable for the newer hardware (with direct
> +on time scheduling capabilities - ConnectX-7 and above):
> +
> +- ``tx_pp_timestamp_order_errors`` - application tried to send packet(s) with timestamps in not
> +  strictly ascending order. Because of PMD does not reorder packets in the hardware queues, scheduling
> +  timestamps order violation causes sending packets in wrong moments of time.

Lots of grammar and spelling errors and overly wordy.
Please spend some time cleaning up the wording, find a writer or AI tool to help.


More information about the dev mailing list