[dpdk-dev] MLX5 should define the timestamp field in the doc

Tom Barbette barbette at kth.se
Wed Sep 5 11:00:03 CEST 2018


Actually I managed this patch to implement support for rte_eth_timesync_read_time.


Please tell me potential modifications, and if I shall submit it again as a "normal" patch to dev ?


---
 drivers/net/mlx5/mlx5.c        |  1 +
 drivers/net/mlx5/mlx5.h        |  1 +
 drivers/net/mlx5/mlx5_ethdev.c | 30 ++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_glue.c   |  8 ++++++++
 drivers/net/mlx5/mlx5_glue.h   |  2 ++
 5 files changed, 42 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c933e27..8c34794 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -324,6 +324,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
  .xstats_reset = mlx5_xstats_reset,
  .xstats_get_names = mlx5_xstats_get_names,
  .dev_infos_get = mlx5_dev_infos_get,
+ .timesync_read_time = mlx5_timesync_read_time,
  .dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
  .vlan_filter_set = mlx5_vlan_filter_set,
  .rx_queue_setup = mlx5_rx_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 997b04a..5747304 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -217,6 +217,7 @@ int mlx5_set_flags(struct rte_eth_dev *dev, unsigned int keep,
     unsigned int flags);
 int mlx5_dev_configure(struct rte_eth_dev *dev);
 void mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info);
+int mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec* time);
 const uint32_t *mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev);
 int mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete);
 int mlx5_force_link_status_change(struct rte_eth_dev *dev, int status);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 90488af..b7f0d91 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -480,6 +480,36 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 }

 /**
+ * Get device current time
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @param[out] time
+ *   Time output value.
+ *
+ * @return
+ *   0 if the time has correctly been set
+ */
+int
+mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec *time)
+{
+    struct priv *priv = dev->data->dev_private;
+    struct ibv_values_ex values;
+    int err = 0;
+
+    values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
+    if ((err = mlx5_glue->query_rt_values_ex(priv->ctx, &values)) != 0) {
+ DRV_LOG(WARNING, "Could not query time !");
+        return err;
+    }
+
+    *time = values.raw_clock;
+    return 0;
+}
+
+
+/**
  * Get supported packet types.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index c7965e5..3c72f5b 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -84,6 +84,13 @@ mlx5_glue_query_device_ex(struct ibv_context *context,
 }

 static int
+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
+   struct ibv_values_ex* values)
+{
+ return ibv_query_rt_values_ex(context, values);
+}
+
+static int
 mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
       struct ibv_port_attr *port_attr)
 {
@@ -354,6 +361,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
  .close_device = mlx5_glue_close_device,
  .query_device = mlx5_glue_query_device,
  .query_device_ex = mlx5_glue_query_device_ex,
+ .query_rt_values_ex = mlx5_glue_query_rt_values_ex,
  .query_port = mlx5_glue_query_port,
  .create_comp_channel = mlx5_glue_create_comp_channel,
  .destroy_comp_channel = mlx5_glue_destroy_comp_channel,
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index e584d36..0582e95 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -54,6 +54,8 @@ struct mlx5_glue {
  int (*query_device_ex)(struct ibv_context *context,
         const struct ibv_query_device_ex_input *input,
         struct ibv_device_attr_ex *attr);
+ int (*query_rt_values_ex)(struct ibv_context *context,
+        struct ibv_values_ex *values);
  int (*query_port)(struct ibv_context *context, uint8_t port_num,
    struct ibv_port_attr *port_attr);
  struct ibv_comp_channel *(*create_comp_channel)
--
2.7.4





________________________________
De : Shahaf Shuler <shahafs at mellanox.com>
Envoyé : mercredi 5 septembre 2018 10:18
À : Tom Barbette; dev at dpdk.org; Alex Rosenbaum
Cc : Yongseok Koh; john.mcnamara at intel.com; marko.kovacevic at intel.com
Objet : RE: MLX5 should define the timestamp field in the doc

Thanks for the details.

The use case is clear. We will take it internally to see when we can support it.
AFAIK we cannot read the internal time from userspace.

Adding also AlexR to comment

From: Tom Barbette <barbette at kth.se>
Sent: Wednesday, September 5, 2018 10:11 AM
To: Shahaf Shuler <shahafs at mellanox.com>; dev at dpdk.org
Cc: Yongseok Koh <yskoh at mellanox.com>; john.mcnamara at intel.com; marko.kovacevic at intel.com
Subject: RE: MLX5 should define the timestamp field in the doc


Thanks for your answer Shahaf !



We're trying to measure the latency of packets going through various service chains inside individual "server".  Eg. we can see that on Server 1, the latency for the service chain handling HTTP packets is ~800ns (+ max and mins, tail latency, etc). What we do now is to timestamp packets right after they are received, and compute the difference with the timestamp just before they are sent. Over a cluster this shows us where the latency is happening.



We would like this "box" latency to include the time spent in queues, and for that the hardware timestamp seems fit-for-purpose as it would timestamp the packets before the software queues. Moreover, as we use batching, we lose a lot of precision as we timestamp a whole batch at once.



I'm pretty sure this use case is of interest for many others. Tail latency is of the essence nowadays, and finding where packets get delayed precisely is important.

?

Instead of converting the timestamp to real time, in this very use case it seems the Mellanox card could actually be our unique source of time, we just need to be able to convert ticks to seconds.



Any chance we can run an equivalent of mlx5_read_internal_timer (https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L623) ?from userspace ? Are these registers also mapped, or can be done so with a few changes? With only that we can actually derive the frequency and the offset easily.?



Tom


More information about the dev mailing list