[dpdk-dev] virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd

Gopakumar Choorakkot Edakkunni gopakumar.c.e at gmail.com
Sat Mar 18 22:32:35 CET 2017


Hi Yuan,

As a "hack"/"workaround", in rte_eal_init(), if I can call vtpci_reset()
just before rte_eal_memory_init(), that should take care of the problem of
host zeroing out hugepages right ? As of today vtpci_reset() is called in
rte_eal_dev_init() which comes *after* rte_eal_memory_init()

Rgds,
Gopa.

On Thu, Mar 16, 2017 at 10:50 PM, Gopakumar Choorakkot Edakkunni <
gopakumar.c.e at gmail.com> wrote:

> Thanks again Yuanhan, you are the true expert!!
>
> Rgds,
> Gopa.
>
> On Thu, Mar 16, 2017 at 10:40 PM, Yuanhan Liu <yuanhan.liu at linux.intel.com
> > wrote:
>
>> On Thu, Mar 16, 2017 at 10:30:09PM -0700, Gopakumar Choorakkot Edakkunni
>> wrote:
>> > Thanks for the confirmation, glad I reached the person who knows the
>> nuts and
>> > bolts of virtio :-). So if the host is not in our control (ie if I am
>> just
>> > running as a VM on host provided by thirdparty vendor), is there any
>> workaround
>> > I can do from the guest side to prevent problems from happening on a
>> guest
>> > restart ?
>>
>> Not too much. You might want to hack the guest DPDK EAL memory initiation
>> part though, to not reset the hugepage memory on start. But that's too
>> hacky
>> that I will not recommend you to do so!
>>
>> > And if theres no workarounds at all and the host has to change, instead
>> of
>> > asking the third party vendor to do a wholesale upgrade to 16.04, is
>> there one/
>> > few commits that can be added to the host ovs-dpdk to take care of this
>> guest
>> > restart virtio-reset-before opening case ?
>>
>> Yes, backporting the commits I have mentioned should be able to fix it.
>> But please note that I did some code refactorings before those fixes: it
>> won't apply cleanly to DPDK v2.2.
>>
>> And if you want to upgrade, I'd suggest to upgrade to v16.11, which is
>> LTS release.
>>
>>         --yliu
>> >
>> > Rgds,
>> > Gopa.
>> >
>> > On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu <
>> yuanhan.liu at linux.intel.com>
>> > wrote:
>> >
>> >     On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot
>> Edakkunni
>> >     wrote:
>> >     > >> When I was saying dpdk version, I meant the DPDK version with
>> OVS.
>> >     >
>> >     > Oh I see! My apologies for the misuderstanding. The dpdk version
>> used by
>> >     host
>> >     > ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS
>> process
>> >     is not
>> >     > getting restarted, what is getting restarted is the guest process
>> using
>> >     > dpdk16.07 - so the above clarifications you had about virtio being
>> >     > reset-before-opened on guest restart - does that still hold good
>> or does
>> >     that
>> >     > need the HOST side dpdk to be 16.04 or above ?
>> >
>> >     Yes, the HOST dpdk should be >= v16.04.
>> >
>> >             --yliu
>> >     >
>> >     > >> And yes, the fixes are not included in the DPDK required for
>> OVS 2.4.
>> >     >
>> >     > Thanks for the info.
>> >     >
>> >     > Rgds,
>> >     > Gopa.
>> >     >
>> >     > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <
>> >     yuanhan.liu at linux.intel.com>
>> >     > wrote:
>> >     >
>> >     >     On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar Choorakkot
>> >     Edakkunni
>> >     >     wrote:
>> >     >     > Hi Yuanhan,
>> >     >     >
>> >     >     > Thanks for the confirmation about not having to do anything
>> special
>> >     to
>> >     >     close
>> >     >     > the ports on dpdk going down or coming up.
>> >     >     >
>> >     >     > As for the question about if I met any issue of ovs getting
>> stuck -
>> >     yes,
>> >     >     my
>> >     >     > guest process runs dpdk 16.07 as I mentioned earlier - and
>> if I
>> >     kill my
>> >     >     guest
>> >     >     > process, then the host OVS-dpdk on the host reports stall !
>> The
>> >     OVS-dpdk
>> >     >     and
>> >     >     > emu versions I use are as below. But maybe that is because
>> of the
>> >     ovs
>> >     >     missing
>> >     >     > the fixes you mentioned ?
>> >     >
>> >     >     When I was saying dpdk version, I meant the DPDK version with
>> OVS.
>> >     >
>> >     >     > ~# ovs-vswitchd --version
>> >     >     > ovs-vswitchd (Open vSwitch) 2.4.1
>> >     >
>> >     >     And yes, the fixes are not included in the DPDK required for
>> OVS 2.4.
>> >     >
>> >     >             --yliu
>> >     >
>> >     >     > Compiled Nov 14 2016 06:53:31
>> >     >     > # kvm --version
>> >     >     > QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice
>> >     Bellard
>> >     >     > ~#
>> >     >     >
>> >     >     >
>> >     >     > Rgds,
>> >     >     > Gopa.
>> >     >     >
>> >     >     > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <
>> >     yuanhan.liu at linux.intel.com
>> >     >     >
>> >     >     > wrote:
>> >     >     >
>> >     >     >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar
>> Choorakkot
>> >     >     Edakkunni
>> >     >     >     wrote:
>> >     >     >     > Thanks a lot for the response Yuanhan. I am using dpdk
>> >     v16.07. So
>> >     >     what
>> >     >     >     you are
>> >     >     >     > saying is that in 16.07, we dont really need to call
>> >     >     rte_eth_dev_close()
>> >     >     >     on
>> >     >     >     > exit,
>> >     >     >
>> >     >     >     It's not about "don't really need", it's more like
>> "it's hard
>> >     to".
>> >     >     Just
>> >     >     >     think that it may crash at any time.
>> >     >     >
>> >     >     >     > because dpdk will ensure that it will do virtio reset
>> before
>> >     init
>> >     >     when it
>> >     >     >     > comes up right ?
>> >     >     >
>> >     >     >     No, It just handles the abnormal case well when guest
>> APP
>> >     restarts.
>> >     >     >
>> >     >     >     > Regarding the vhost commits you mentioned - do we
>> still need
>> >     those
>> >     >     fixes
>> >     >     >     if we
>> >     >     >     > have the "virtio reset before init" mechanism ?
>> >     >     >
>> >     >     >     Yes, we still need them: just think some malicious
>> guest may
>> >     also
>> >     >     forge
>> >     >     >     data like that.
>> >     >     >
>> >     >     >     I'm a bit confused then. Have you actually met any
>> issue (like
>> >     got
>> >     >     stucked)
>> >     >     >     with DPDK v16.07?
>> >     >     >
>> >     >     >             --yliu
>> >     >     >
>> >     >     >     > Or that is a seperate problem
>> >     >     >     > altogether (and hence we would need those fixes) ?
>> >     >     >     >
>> >     >     >     > Rgds,
>> >     >     >     > Gopa.
>> >     >     >     >
>> >     >     >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
>> >     >     yuanhan.liu at linux.intel.com
>> >     >     >     >
>> >     >     >     > wrote:
>> >     >     >     >
>> >     >     >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700,
>> Gopakumar
>> >     Choorakkot
>> >     >     >     Edakkunni
>> >     >     >     >     wrote:
>> >     >     >     >     > So the doc says we should call
>> rte_eth_dev_close()
>> >     *before*
>> >     >     going
>> >     >     >     down.
>> >     >     >     >     And I
>> >     >     >     >     > know that especially in dpdk-virtionet  in the
>> guest +
>> >     >     ovs-dpdk in
>> >     >     >     the
>> >     >     >     >     host,
>> >     >     >     >     > the ovs ends up getting stalled/stuck (!!) if I
>> dont
>> >     close
>> >     >     the port
>> >     >     >     >     before
>> >     >     >     >     > starting() it when the guest dpdk process comes
>> back
>> >     up.
>> >     >     >     >
>> >     >     >     >     I'm assuming you were using an old version,
>> something
>> >     like dpdk
>> >     >     v2.2?
>> >     >     >     >     IIRC, DPDK v16.04 should have fixed your issue.
>> >     >     >     >
>> >     >     >     >     > Considering that this not done properly can
>> screw up
>> >     the HOST
>> >     >     ovs,
>> >     >     >     and I
>> >     >     >     >     want
>> >     >     >     >     > to do everything possible to avoid that, I want
>> to be
>> >     200%
>> >     >     sure
>> >     >     >     that I
>> >     >     >     >     call
>> >     >     >     >     > close even if my process gets a kill -9 .. So
>> obviously
>> >     the
>> >     >     only
>> >     >     >     way of
>> >     >     >     >     doing
>> >     >     >     >     > that is to close the port when the dpdk process
>> comes
>> >     back up
>> >     >     and
>> >     >     >     >     *before* we
>> >     >     >     >     > init the port. rte_eth_dev_close() is not
>> capable of
>> >     doing
>> >     >     that as
>> >     >     >     it
>> >     >     >     >     expects
>> >     >     >     >     > the port parameters to be initialized etc..
>> before it
>> >     can be
>> >     >     >     called.
>> >     >     >     >
>> >     >     >     >     We do virtio reset before init, which is
>> basically what
>> >     >     >     rte_eth_dev_close()
>> >     >     >     >     mainly does. So I see no big issue here.
>> >     >     >     >
>> >     >     >     >     The stuck issue is due to hugepage reset by the
>> guest
>> >     DPDK
>> >     >     >     application,
>> >     >     >     >     leading all virtio vring elements being mem
>> zeroed. The
>> >     old
>> >     >     vhost
>> >     >     >     doesn't
>> >     >     >     >     handle it well, as a result, it got stuck. And
>> here are
>> >     some
>> >     >     relevant
>> >     >     >     >     commits:
>> >     >     >     >
>> >     >     >     >         a436f53 vhost: avoid dead loop chain
>> >     >     >     >         c687b0b vhost: check for ring descriptors
>> overflow
>> >     >     >     >         623bc47 vhost: do sanity check for ring
>> descriptor
>> >     length
>> >     >     >     >
>> >     >     >     >             --yliu
>> >     >     >     >
>> >     >     >     >     > Any other
>> >     >     >     >     > suggestions on what can be done to close on
>> restart
>> >     rather
>> >     >     than
>> >     >     >     close on
>> >     >     >     >     going
>> >     >     >     >     > down ? Thought of bouncing this by the alias
>> before I
>> >     add a
>> >     >     version
>> >     >     >     of
>> >     >     >     >     close
>> >     >     >     >     > myself that can do this close-on-restart
>> >     >     >     >
>> >     >     >     >
>> >     >     >
>> >     >     >
>> >     >
>> >     >
>> >
>> >
>>
>
>


More information about the dev mailing list