[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

Rich Lane rich.lane at bigswitch.com
Thu Dec 24 06:37:29 CET 2015


On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa <mukawa at igel.co.jp> wrote:

> On 2015/12/22 13:47, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <
> yuanhan.liu at linux.intel.com>
> > wrote:
> >
> >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> >> in a few
> >>> ways:
> >> Rich, thanks for the info!
> >>
> >>> 1. new_device/destroy_device: Link state change (will be covered by the
> >> link
> >>> status interrupt).
> >>> 2. new_device: Add first queue to datapath.
> >> I'm wondering why vring_state_changed() is not used, as it will also be
> >> triggered at the beginning, when the default queue (the first queue) is
> >> enabled.
> >>
> > Turns out I'd misread the code and it's already using the
> > vring_state_changed callback for the
> > first queue. Not sure if this is intentional but vring_state_changed is
> > called for the first queue
> > before new_device.
> >
> >
> >>> 3. vring_state_changed: Add/remove queue to datapath.
> >>> 4. destroy_device: Remove all queues (vring_state_changed is not called
> >> when
> >>> qemu is killed).
> >> I had a plan to invoke vring_state_changed() to disable all vrings
> >> when destroy_device() is called.
> >>
> > That would be good.
> >
> >
> >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >> You can get the 'struct virtio_net' dev from all above callbacks.
> >
> >
> >> 1. Link status interrupt.
> >>
> >> To vhost pmd, new_device()/destroy_device() equals to the link status
> >> interrupt, where new_device() is a link up, and destroy_device() is link
> >> down().
> >>
> >>
> >>> 2. New queue_state_changed callback. Unlike vring_state_changed this
> >> should
> >>> cover the first queue at new_device and removal of all queues at
> >>> destroy_device.
> >> As stated above, vring_state_changed() should be able to do that, except
> >> the one on destroy_device(), which is not done yet.
> >>
> >>> 3. Per-queue or per-device NUMA node info.
> >> You can query the NUMA node info implicitly by get_mempolicy(); check
> >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> >>
> > Your suggestions are exactly how my application is already working. I was
> > commenting on the
> > proposed changes to the vhost PMD API. I would prefer to
> > use RTE_ETH_EVENT_INTR_LSC
> > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> > of these vhost-specific
> > hacks. The queue state change callback is the one new API that needs to
> be
> > added because
> > normal NICs don't have this behavior.
> >
> > You could add another rte_eth_event_type for the queue state change
> > callback, and pass the
> > queue ID, RX/TX direction, and enable bit through cb_arg.
>
> Hi Rich,
>
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.
>

You're right, I'd mistakenly thought that the PMD controlled the void *
passed to the callback.

Here's a thought:

    struct rte_eth_vhost_queue_event {
        uint16_t queue_id;
        bool rx;
        bool enable;
    };

    int rte_eth_vhost_get_queue_event(uint8_t port_id, struct
rte_eth_vhost_queue_event *event);

On receiving the ethdev event the application could repeatedly call
rte_eth_vhost_get_queue_event
to find out what happened.

An issue with having the application dig into struct virtio_net is that it
can only be safely accessed from
a callback on the vhost thread. A typical application running its control
plane on lcore 0 would need to
copy all the relevant info from struct virtio_net before sending it over.

As you mentioned, queues for a single vhost port could be located on
different NUMA nodes. I think this
is an uncommon scenario but if needed you could add an API to retrieve the
NUMA node for a given port
and queue.


More information about the dev mailing list