[dpdk-dev] [PATCH 1/3] vhost: pre update used ring for Tx and Rx
Yuanhan Liu
yuanhan.liu at linux.intel.com
Wed Jun 1 08:55:57 CEST 2016
On Wed, Jun 01, 2016 at 06:40:41AM +0000, Xie, Huawei wrote:
> > /* Retrieve all of the head indexes first to avoid caching issues. */
> > for (i = 0; i < count; i++) {
> > - desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
> > - (vq->size - 1)];
> > + used_idx = (vq->last_used_idx + i) & (vq->size - 1);
> > + desc_indexes[i] = vq->avail->ring[used_idx];
> > +
> > + vq->used->ring[used_idx].id = desc_indexes[i];
> > + vq->used->ring[used_idx].len = 0;
> > + vhost_log_used_vring(dev, vq,
> > + offsetof(struct vring_used, ring[used_idx]),
> > + sizeof(vq->used->ring[used_idx]));
> > }
> >
> > /* Prefetch descriptor index. */
> > rte_prefetch0(&vq->desc[desc_indexes[0]]);
> > - rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]);
> > -
> > for (i = 0; i < count; i++) {
> > int err;
> >
> > - if (likely(i + 1 < count)) {
> > + if (likely(i + 1 < count))
> > rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
> > - rte_prefetch0(&vq->used->ring[(used_idx + 1) &
> > - (vq->size - 1)]);
> > - }
> >
> > pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
> > if (unlikely(pkts[i] == NULL)) {
> > @@ -916,18 +920,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> > rte_pktmbuf_free(pkts[i]);
> > break;
> > }
> > -
> > - used_idx = vq->last_used_idx++ & (vq->size - 1);
> > - vq->used->ring[used_idx].id = desc_indexes[i];
> > - vq->used->ring[used_idx].len = 0;
> > - vhost_log_used_vring(dev, vq,
> > - offsetof(struct vring_used, ring[used_idx]),
> > - sizeof(vq->used->ring[used_idx]));
> > }
>
> Had tried post-updating used ring in batch, but forget the perf change.
I would assume pre-updating gives better performance gain, as we are
fiddling with avail and used ring together, which would be more cache
friendly.
> One optimization would be on vhost_log_used_ring.
> I have two ideas,
> a) In QEMU side, we always assume use ring will be changed. so that we
> don't need to log used ring in VHOST.
>
> Michael: feasible in QEMU? comments on this?
>
> b) We could always mark the total used ring modified rather than entry
> by entry.
I doubt it's worthwhile. One fact is that vhost_log_used_ring is
a non operation in most time: it will take action only in the short
gap of during live migration.
And FYI, I even tried with all vhost_log_xxx being removed, it showed
no performance boost at all. Therefore, it's not a factor that will
impact performance.
--yliu
More information about the dev
mailing list