[dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp intitialization
Phil Yang
Phil.Yang at arm.com
Wed Jul 29 11:37:41 CEST 2020
Hi Slava,
Thanks for your analysis. Some comments inline.
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo at mellanox.com>
> Sent: Wednesday, July 29, 2020 4:08 PM
> To: Phil Yang <Phil.Yang at arm.com>
> Cc: Matan Azrad <matan at mellanox.com>; Raslan Darawsheh
> <rasland at mellanox.com>; thomas at monjalon.net; ferruh.yigit at intel.com;
> Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; nd
> <nd at arm.com>; dev at dpdk.org; nd <nd at arm.com>
> Subject: RE: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp
> intitialization
>
> Hi, Phil
>
> Very nice comment, I found it is useful, I'm on moving mlx5 PMD to use C11
> atomic primitives, thank you.
> But, don't you think it is an overkill for testpmd case and would make code
> wordy and less readable ?
> How does testpmd start the forwarding? Let's have a glance at
> launch_packet_forwarding():
>
> - initialize forwarding with port_fwd_begin() - It is tx_only_begin for txonly
> mode, master core context
> - launch the forwarding cores rte_eal_remote_launch(), the forwarding will
> happen in forward core context
>
> Let's reconsider:
>
> - tx_only_begin() is called once in master core context, forwarding is not
> launched yet, there is NO any
> concurrent access to variables, we can set ones in any way, NO atomic,
> volatile, etc. is needed at all.
> We should do setup and commit all our memory writes - rte_wmb() seems
I thought this barrier in this patch was only for synchronizing the 'timestamp_init_req' between master core and forwarding cores.
But in fact, it is more than that. I am clear now.
I think it might be helpful to add some comments to this barrier.
> to be the very native way
> to do it once for all preceding writes. Performance can be neglected here - it
> is not a datapath.
Agreed. It is not in the datapath and it won't hurt performance.
>
> - pkt_burst_prepare() is being executed in the forwarding core context, and
> perform only READ access
> to the global variables, those are stable while forwarding lasts, and can be
> considered as constants.
> No atomic, barriers, etc. are needed either. I agree - the volatile is
> redundant ("over-reassurance"), let's
> remove it. But atomic access - not needed, would make code complicated.
If we remove the volatile qualifier, then these atomic builtins becomes unnecessary.
Because all these 32-bits READs are atomic.
Thanks,
Phil
>
> What do you think? Do you insist on atomic-barriered access implementing?
>
> With best regards,
> Slava
>
> > -----Original Message-----
> > From: Phil Yang <Phil.Yang at arm.com>
> > Sent: Tuesday, July 28, 2020 19:24
> > To: Slava Ovsiienko <viacheslavo at mellanox.com>
> > Cc: Matan Azrad <matan at mellanox.com>; Raslan Darawsheh
> > <rasland at mellanox.com>; Thomas Monjalon <thomas at monjalon.net>;
> > ferruh.yigit at intel.com; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>; dev at dpdk.org;
> nd
> > <nd at arm.com>
> > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp
> > intitialization
> >
> > > -----Original Message-----
> > > From: dev <dev-bounces at dpdk.org> On Behalf Of Viacheslav Ovsiienko
> > > Sent: Monday, July 27, 2020 11:27 PM
> > > To: dev at dpdk.org
> > > Cc: matan at mellanox.com; rasland at mellanox.com;
> > thomas at monjalon.net;
> > > ferruh.yigit at intel.com
> > > Subject: [dpdk-dev] [PATCH] app/testpmd: fix txonly mode timestamp
> > > intitialization
> > >
> > > The testpmd application forwards data in multiple threads.
> > > In the txonly mode the Tx timestamps must be initialized on per thread
> > > basis to provide phase shift for the packet burst being sent. This per
> > > thread initialization was performed on zero value of the variable in
> > > thread local storage and happened only once after testpmd forwarding
> > > start. Executing "start" and "stop" commands did not cause thread
> > > local variables zeroing and wrong timestamp values were used.
> >
> > I think it is too heavy to use rte_wmb() to guarantee the visibility of
> > 'timestamp_init_req' updating for subsequent read operations.
> > We can use C11 atomics with explicit memory ordering instead of
> rte_wmb()
> > to achieve the same goal.
> >
> > >
> > > Fixes: 4940344dab1d ("app/testpmd: add Tx scheduling command")
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo at mellanox.com>
> > > ---
> > > app/test-pmd/txonly.c | 11 ++++++++++-
> > > 1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index
> > > 97f4a45..415431d 100644
> > > --- a/app/test-pmd/txonly.c
> > > +++ b/app/test-pmd/txonly.c
> > > @@ -55,9 +55,13 @@
> > > static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets.
> > > */ RTE_DEFINE_PER_LCORE(uint64_t, timestamp_qskew);
> > > /**< Timestamp offset per queue */
> > > +RTE_DEFINE_PER_LCORE(uint32_t, timestamp_idone); /**< Timestamp
> > init
> > > done. */
> > > +
> > > static uint64_t timestamp_mask; /**< Timestamp dynamic flag mask */
> > > static int32_t timestamp_off; /**< Timestamp dynamic field offset */
> > > static bool timestamp_enable; /**< Timestamp enable */
> > > +static volatile uint32_t timestamp_init_req;
> >
> > If we use C11 atomic builtins for 'timestamp_init_req' accessing, the volatile
> > key word becomes unnecessary.
> > Because they will generate same instructions.
> >
> > > + /**< Timestamp initialization request. */
> > > static uint64_t timestamp_initial[RTE_MAX_ETHPORTS];
> > >
> > > static void
> > > @@ -229,7 +233,8 @@
> > > rte_be64_t ts;
> > > } timestamp_mark;
> > >
> > > - if (unlikely(!skew)) {
> > > + if (unlikely(timestamp_init_req !=
> >
> > if (unlikely(__atomic_load_n(×tamp_init_req,
> __ATOMIC_RELAXED) !=
> >
> > > + RTE_PER_LCORE(timestamp_idone))) {
> > > struct rte_eth_dev *dev = &rte_eth_devices[fs-
> > > >tx_port];
> > > unsigned int txqs_n = dev->data->nb_tx_queues;
> > > uint64_t phase = tx_pkt_times_inter * fs->tx_queue
> /
> > @@ -241,6
> > > +246,7 @@
> > > skew = timestamp_initial[fs->tx_port] +
> > > tx_pkt_times_inter + phase;
> > > RTE_PER_LCORE(timestamp_qskew) = skew;
> > > + RTE_PER_LCORE(timestamp_idone) =
> > > timestamp_init_req;
> >
> >
> > RTE_PER_LCORE(timestamp_idone) =
> > __atomic_load_n(×tamp_init_req, __ATOMIC_RELAXED);
> >
> >
> > > }
> > > timestamp_mark.pkt_idx = rte_cpu_to_be_16(idx);
> > > timestamp_mark.queue_idx = rte_cpu_to_be_16(fs-
> > > >tx_queue);
> > > @@ -426,6 +432,9 @@
> > > timestamp_mask &&
> > > timestamp_off >= 0 &&
> > > !rte_eth_read_clock(pi, ×tamp_initial[pi]);
> > > + if (timestamp_enable)
> > > + timestamp_init_req++;
> >
> >
> > __atomic_add_fetch(×tamp_init_req, 1, __ATOMIC_ACQ_REL);
> >
> >
> > > + rte_wmb();
> >
> > We can remove it now.
> >
> >
> > Thanks,
> > Phil
More information about the dev
mailing list