[dpdk-dev] [1/5] vhost: enforce avail index and desc read ordering

Ilya Maximets i.maximets at samsung.com
Thu Dec 6 13:48:31 CET 2018


On 06.12.2018 7:17, Jason Wang wrote:
> 
> On 2018/12/5 下午7:30, Ilya Maximets wrote:
>> On 05.12.2018 12:49, Maxime Coquelin wrote:
>>> A read barrier is required to ensure the ordering between
>>> available index and the descriptor reads is enforced.
>>>
>>> Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application")
>>> Cc: stable at dpdk.org
>>>
>>> Reported-by: Jason Wang <jasowang at redhat.com>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin at redhat.com>
>>> ---
>>>   lib/librte_vhost/virtio_net.c | 12 ++++++++++++
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>>> index 5e1a1a727..f11ebb54f 100644
>>> --- a/lib/librte_vhost/virtio_net.c
>>> +++ b/lib/librte_vhost/virtio_net.c
>>> @@ -791,6 +791,12 @@ virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>>>       rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
>>>       avail_head = *((volatile uint16_t *)&vq->avail->idx);
>>>   +    /*
>>> +     * The ordering between avail index and
>>> +     * desc reads needs to be enforced.
>>> +     */
>>> +    rte_smp_rmb();
>>> +
>> Hmm. This looks weird to me.
>> Could you please describe the bad scenario here? (It'll be good to have it
>> in commit message too)
>>
>> As I understand, you're enforcing the read of avail->idx to happen before
>> reading the avail->ring[avail_idx]. Is it correct?
>>
>> But we have following code sequence:
>>
>> 1. read avail->idx (avail_head).
>> 2. check that last_avail_idx != avail_head.
>> 3. read from the ring using last_avail_idx.
>>
>> So, there is a strict dependency between all 3 steps and the memory
>> transaction will be finished at the step #2 in any case. There is no
>> way to read the ring before reading the avail->idx.
>>
>> Am I missing something?
> 
> 
> Nope, I kind of get what you meaning now. And even if we will
> 
> 4. read descriptor from descriptor ring using the id read from 3
> 
> 5. read descriptor content according to the address from 4
> 
> They still have dependent memory access. So there's no need for rmb.
> 

On a second glance I changed my mind.
The code looks like this:

1. read avail_head = avail->idx
2. read cur_idx    = last_avail_idx
if (cur_idx != avail_head) {
    3. read idx = avail->ring[cur_idx]
    4. read desc[idx]
}

There is an address (data) dependency: 2 -> 3 -> 4.
These reads could not be reordered.

But it's only control dependency between 1 and (3, 4), because 'avail_head'
is not used to calculate 'cur_idx'. In case of aggressive speculative
execution, 1 could be reordered with 3 resulting with reading of not yet
updated 'idx'.

Not sure if speculative execution could go so far while 'avail_head' is not
read yet, but it's should be possible in theory.

Thoughts ?

>>
>>>       for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
>>>           uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
>>>           uint16_t nr_vec = 0;
>>> @@ -1373,6 +1379,12 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>>>       if (free_entries == 0)
>>>           return 0;
>>>   +    /*
>>> +     * The ordering between avail index and
>>> +     * desc reads needs to be enforced.
>>> +     */
>>> +    rte_smp_rmb();
>>> +
>> This one is strange too.
>>
>>     free_entries = *((volatile uint16_t *)&vq->avail->idx) -
>>             vq->last_avail_idx;
>>     if (free_entries == 0)
>>         return 0;
>>
>> The code reads the value of avail->idx and uses the value on the next
>> line even with any compiler optimizations. There is no way for CPU to
>> postpone the actual read.
> 
> 
> Yes.
> 

It's kind of similar situation here, but 'avail_head' is involved somehow
in 'cur_idx' calculation because of
	fill_vec_buf_split(..., vq->last_avail_idx + i, ...)
And 'i' depends on 'free_entries'. But we need to look at the exact asm
code to be sure. I think, we may add barrier here to avoid possible issues.

> Thanks
> 
> 
>>
>>>       VHOST_LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
>>>         count = RTE_MIN(count, MAX_PKT_BURST);
>>>
> 
> 


More information about the dev mailing list