[dpdk-dev] [PATCH v2 1/2] virtio: one way barrier for split vring used idx
Joyce Kong
Joyce.Kong at arm.com
Thu Apr 16 08:46:11 CEST 2020
<snip>
> > > Subject: Re: [dpdk-dev] [PATCH v2 1/2] virtio: one way barrier for
> > > split vring used idx
> > >
> > > On Thu, 2 Apr 2020 10:57:52 +0800
> > > Joyce Kong <joyce.kong at arm.com> wrote:
> > >
> > > > -(vq)->vq_used_cons_idx))
> > > > +static inline uint16_t
> > > > +virtqueue_nused(struct virtqueue *vq)
> > > vq is unmodified and should be const
> > >
> > > > +{
> > > > +uint16_t idx;
> > > > +if (vq->hw->weak_barriers) {
> > > Put blank line between declaration and if statement
> > Will fix in v3.
> > >
> > > > +/* x86 prefers to using rte_smp_rmb over __atomic_load_n as it
> > > > +reports
> > > > + * a slightly better perf, which comes from the saved branch by
> > > > +the
> > > compiler.
> > > > + * The if and else branches are identical with the smp and cio
> > > > + barriers
> > > both
> > > > + * defined as compiler barriers on x86.
> > > > + */
> > >
> > > Do not put comments on left margin (except in function prolog).
> > Will fix in v3.
> > >
> > > > +#ifdef RTE_ARCH_X86_64
> > > > +idx = vq->vq_split.ring.used->idx; rte_smp_rmb(); #else idx =
> > > +__atomic_load_n(&(vq)->vq_split.ring.used->idx,
> > > > +__ATOMIC_ACQUIRE);
> > > > +#endif
> > > > +} else {
> > > > +idx = vq->vq_split.ring.used->idx; rte_cio_rmb(); } return (idx -
> > > > +vq->vq_used_cons_idx);
> > >
> > > Parenthesis around arguments to return are unnecessary.
> > > BSD code likes it, Linux style does not.
> > Will fix in v3.
> > >
> > > > +}
> > >
> > > This kind of arch specific code is hard to maintain.
> > > Does it really make that much difference.
> > Yes, a stronger than required barrier is a performance killer,
> > especially in the fast path.
> > The test was conducted on the ThunderX2+Intel XL710 testbed, the PVP
> > test case.
> > /Gavin
> I think if the performance deference is not much we should stay with C11 built-ins for x86. How much is the
> performance difference on x86?
According to intel's PVP test for packed ring, ~1.5% perf loss would be caused on x86 if staying with __atomic_load_n. So rte_smp_rmb is kept for x86 in packed ring patches which have been merged and the similar change is done in split ring patches.
More information about the dev
mailing list