[dpdk-dev] virtio kills qemu VM after stopping/starting ports

Tan, Jianfeng jianfeng.tan at intel.com
Mon Sep 5 05:20:26 CEST 2016


Hi,

> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tan, Jianfeng
> Sent: Friday, September 2, 2016 8:54 PM
> To: Kyle Larose; dev at dpdk.org
> Cc: Xie, Huawei; yuanhan.liu at linux.intel.com
> Subject: Re: [dpdk-dev] virtio kills qemu VM after stopping/starting ports
> 
> Hi Kyle,
> 
> 
> On 9/2/2016 8:35 PM, Kyle Larose wrote:
> >
> >> -----Original Message-----
> >> From: Tan, Jianfeng [mailto:jianfeng.tan at intel.com]
> >> Sent: Friday, September 02, 2016 2:56 AM
> >> To: Kyle Larose; dev at dpdk.org
> >> Cc: huawei.xie at intel.com; yuanhan.liu at linux.intel.com
> >> Subject: Re: virtio kills qemu VM after stopping/starting ports
> >>
> >> Hi Kyle,
> >>
> >>
> >> On 9/2/2016 4:53 AM, Kyle Larose wrote:
> >>> Hello everyone,
> >>>
> >>> In my own testing, I recently stumbled across an issue where I could get
> qemu
> >> to exit when sending traffic to my application. To do this, I simply needed
> to do
> >> the following:
> >>> 1) Start my virtio interfaces
> >>> 2) Send some traffic into/out of the interfaces
> >>> 3) Stop the interfaces
> >>> 4) Start the interfaces
> >>> 5) Send some more traffic
> >>>
> >>> At this point, I would lose connectivity to my VM.  Further investigation
> >> revealed qemu exiting with the following log:
> >>> 	2016-09-01T15:45:32.119059Z qemu-kvm: Guest moved used index
> >> from 5
> >>> to 1
> >>>
> >>> I found the following bug report against qemu, reported by a user of
> >>> DPDK: https://bugs.launchpad.net/qemu/+bug/1558175
> >>>
> >>> That thread seems to have stalled out, so I think we probably should
> deal with
> >> the problem within DPDK itself. Either way, later in the bug report chain,
> we
> >> see a link to this patch to DPDK:
> >> http://dpdk.org/browse/dpdk/commit/?id=9a0615af774648. The
> submitter of
> >> the bug report claims that this patch fixes the problem. Perhaps it does.
> >>
> >> I once got a chance to analyze the bug you referred here. The patch
> >> (http://dpdk.org/browse/dpdk/commit/?id=9a0615af774648) does not fix
> that
> >> bug. The root cause of that bug is: when DPDK appilcation gets killed,
> nobody
> >> tells the vhost backend to stop. So when restarting the DPDK app, those
> >> hugepages are reused, and initialized to all zero. And unavoidably, "idx" in
> >> those memory is changed to 0. I have written a patch to notify the vhost
> >> backend to stop when DPDK app gets killed suddenly (more accurate,
> when
> >> /dev/uioX gets closed), and this patch will be sent out recently. And this
> patch
> >> does not fix your problem, either. You did not kill the app. (I should not
> update
> >> info about that bug here, and I mean they are different problems)
> >>
> >>> However, it introduces a new problem: If I remove the patch, I cannot
> >> reproduce the problem. So, that leads me to believe that it has caused a
> >> regression.
> >>> To summarize the patch’s changes, it basically changes the
> virtio_dev_stop
> >> function to flag the device as stopped, and stops the device when
> >> closing/uninitializing it. However, there is a seemingly unintended side-
> effect.
> >>> In virtio_dev_start, we have the following block of code:
> >>>
> >>> 	/* On restart after stop do not touch queues */
> >>> 	if (hw->started)
> >>> 		return 0;
> >>>
> >>> 	/* Do final configuration before rx/tx engine starts */
> >>> 	virtio_dev_rxtx_start(dev);
> >>>
> >>> ….
> >>>
> >>> Prior to the patch, if an interface were stopped then started, without
> >> restarting the application, the queues would be left as-is, because hw-
> >started
> >> would be set to 1.
> >>
> >> Yes, my previous patch did break this behavior (by stopping and re-
> starting the
> >> device, the queues would be left as-is) and leads to the problem here.
> And you
> >> are proposing to recover.
> >>
> >> But is this a right behavior to follow?
> >> On the one side, when we stop the virtio device, should we notify the
> vhost
> >> backend to stop too? Currently, we just flag it as stopped.
> >> On the other side, now in many PMDs, we mix the dev
> >> initialization/configuration code into dev_start functions, that is to day,
> we re-
> >> configure the device every time we start it (may be to make sure that
> changed
> >> configuration will be configured). Then what if we increase the queue
> numbers
> >> of virtio?
> > Hi Jianfeng,
> >
> >
> > Let me see if my understanding is correct. You're saying that there is a
> problem introduced by your patch, but going back to the original behaviour is
> not ideal: it still leaves a gap with respect to changing the number of queues
> with virtio. The ideal solution would have us properly tear down the device
> when it is stopped, and then properly reinitialize it when it is started? That
> seems reasonable. However, I'm not sure what it would take.
> 
> Yes, exactly.
> 
> > Does it make sense to submit a patch like I suggested in the short term, and
> have somebody work on a more robust long term solution? How could we go
> about making sure that happens? (I can't allocate much time to fixing this, so
> I likely won't be able to do anything more complicated than what I've
> proposed).
> 
> Let's see if maintainers have any suggestions here. (And I personally do
> not oppose your proposal for short term).

To add more info:

The previous patch 9a0615af774 was to fix the problem which is described here:
http://dpdk.org/ml/archives/dev/2015-December/030911.html

And it's about reconfiguring queue number after stopping the device and then restart the device.

Thanks,
Jianfeng

> 
> Thanks,
> Jianfeng
> 
> 



More information about the dev mailing list