|SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Tue Mar 31 18:54:48 CEST 2026

Previous message (by thread): |SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net
Next message (by thread): |SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/162788

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-03-31

sha: (not provided in patch file, appears to be from patchwork ID 162788)
Author: YuanXin <yuanxin123456 at 126.com>
Subject: [PATCH] vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net

This patch attempts to fix a race condition where numa_realloc() frees
and reallocates a virtio_net structure while another thread is accessing
it through a poller.

> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 4bfb13fb98..1312f95cbc 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -681,6 +681,9 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>  
>  out_dev_realloc:
>  
> +	if (vq->index > 0)
> +		return;
> +

Does this check introduce a NUMA allocation bug? According to the commit
message, the issue is that numa_realloc() can be called multiple times
(once per vq kick), causing the device to be reallocated even when
already on the correct NUMA node. However, this fix only allows
numa_realloc() to proceed for vq->index == 0.

What happens if:
1. Queue 0 is kicked first on NUMA node A
2. Queue 1 is kicked later on NUMA node B
3. The device gets allocated on node A (from queue 0)
4. Queue 1's kick is ignored (vq->index > 0)
5. The device remains on node A instead of node B

Does this break the intended NUMA affinity behavior where the device
should be reallocated to match the NUMA node of the most recently kicked
queue?

Additionally, does the existing check "if (dev->flags &
VIRTIO_DEV_RUNNING)" already provide partial protection? The commit
message states "Why dev->flags & VIRTIO_DEV_RUNNING doesn't prevent that
happening? Because there is no lock protecting dev(virtio_net), only vq
is protected by lock." However, this new check also runs without any
lock protection, so it suffers from the same race condition.

Does this code need proper locking instead of trying to avoid the
reallocation? The fundamental issue appears to be that dev can be freed
while another thread accesses it. Adding a check based on vq->index
doesn't prevent the race - it just makes it less likely by reducing the
window where reallocation can occur.

Should the VIRTIO_DEV_RUNNING flag be set earlier, or should there be a
reference counting mechanism to prevent freeing dev while it's in use?

>  	if (dev->flags & VIRTIO_DEV_RUNNING)
>  		return;
>

Previous message (by thread): |SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net
Next message (by thread): |SUCCESS| vhost: fix rte_vhost_vring_call coredump caused by numa_realloc free virtio_net
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the test-report mailing list