|SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Tue Mar 31 14:47:09 CEST 2026

Previous message (by thread): |SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net
Next message (by thread): |SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/162788

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-03-31

This patch addresses a use-after-free issue in the vhost library, but the proposed fix raises several concerns about correctness and design.

commit: 20260326092835.20567-2-yuanxin123456 at 126.com
Author: YuanXin <yuanxin123456 at 126.com>
Subject: vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net

The patch attempts to prevent numa_realloc() from reallocating the device
structure on subsequent vq kicks by skipping reallocation when vq->index > 0.

> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 4bfb13fb98..1312f95cbc 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -681,6 +681,9 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>  
>  out_dev_realloc:
>  
> +	if (vq->index > 0)
> +		return;
> +

Does this index check correctly handle the case where vq0 is on a different
NUMA node than the device, but vq1 is kicked first? In that scenario, would
the device remain on the wrong NUMA node?

Could there be a race where vq0 and vq1 are kicked concurrently on
different threads, both passing the index check simultaneously and both
attempting device reallocation?

The commit message states "there is no lock protecting dev(virtio_net),
only vq is protected by lock." Does this mean the device reallocation in
numa_realloc() can race with concurrent operations on other queues? If so,
does adding an index-based gate address the fundamental synchronization
issue, or does it merely reduce the probability of the race?

Is the assumption that vq->index 0 will always be kicked first reliable
across all vhost backend implementations? What happens in scenarios where
queues are enabled in non-sequential order?

>  	if (dev->flags & VIRTIO_DEV_RUNNING)
>  		return;

The existing VIRTIO_DEV_RUNNING check already attempts to prevent
reallocation during active operation. Does the new index check interact
correctly with this flag? Could there be a case where both vq0 and vq1
are kicked before VIRTIO_DEV_RUNNING is set, leading to multiple
reallocation attempts?

Would it be more appropriate to use atomic operations or proper locking
to protect the device structure during NUMA reallocation, rather than
relying on queue index ordering?

Previous message (by thread): |SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net
Next message (by thread): |SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the test-report mailing list