[dpdk-dev] [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework

Pankaj Chauhan pankaj.chauhan at nxp.com
Thu Sep 15 11:00:55 CEST 2016


On 9/13/2016 12:21 PM, Tan, Jianfeng wrote:
Hi Jianfeng,

>>>
>>> On 9/5/2016 6:54 PM, Pankaj Chauhan wrote:
>>>> Introduce support for a generic framework for handling of switching
>>>> between
>>>> physical and vhost devices. The vswitch framework introduces the
>>>> following
>>>> concept:
>>>>
>>>> 1. vswitch_dev: Vswitch device is a logical switch which can have
>>>> physical and
>>>> virtio devices. The devices are operated/used using standard rte_eth
>>>> API for
>>>> physical devices and lib_vhost API for vhost devices, PMD API is not
>>>> used
>>>> for vhost devices.
>>>>
>>>> 2. vswitch_port: Any physical or virtio device that is added to
>>>> vswitch. The
>>>> port can have its own tx/rx functions for doing data transfer, which
>>>> are exposed
>>>> to the framework using generic function pointers
>>>> (vs_port->do_tx/do_rx). This way
>>>> the generic code can do tx/rx without understanding the type of device
>>>> (Physical or
>>>> virtio). Similarly each port has its own functions to select tx/rx
>>>> queues. The framework
>>>> provides default tx/rx queue selection functions to the port when port
>>>> is added
>>>> (for both physical and vhost devices). But the framework allows the
>>>> switch implementation
>>>> to override the queue selection functions (vs_port->get_txq/rxq) if
>>>> required.
>>>>
>>>> 3. vswitch_ops: The ops is set of function pointers which are used to
>>>> do operations
>>>> like learning, unlearning, add/delete port, lookup_and_forward. The
>>>> user of vswitch
>>>> framework (vhost/main.[c,h])uses these function pointers to perform
>>>> above mentioned
>>>> operations, thus it remains agnostic of the underlying implementation.
>>>>
>>>> Different switching logics can implement their vswitch_device and
>>>> vswitch_ops, and
>>>> register with the framework. This framework makes vhost-switch
>>>> application scalable
>>>> in terms of:
>>>>
>>>> 1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
>>>> 2. Number of ports.
>>>> 3. Policies of selecting ports for rx and tx.
>>>>
>>>> Signed-off-by: Pankaj Chauhan <pankaj.chauhan at nxp.com>
>>>
>>> After this patch set, how's mapping relationship between cores and
>>> vswitch_dev? Old vhost example does not have the concept of switch, so
>>> each core is binded with some VDEVs. Now, we still keep original logic?
>>>
>>> (a) If yes, provided that phys device could has no direct relationship
>>> with vdevs in other switching logics, we may need to bind those physical
>>> devices to cores too? In that case, switch_worker() will receiving pkts
>>> from all devices (phys or virtual) and dispatch, which looks like:
>>>
>>> switch_worker() {
>>>     FOR each port binding to this core {
>>>          rx(port, pkts[], count);
>>>          vs_lookup_n_fwd( information_needed );
>>>     }
>>> }
>>
>> Since we support only one switch device running at one time. We bind
>> the ports of the switchdev to the core. But The switch might have it's
>> own logic to bind the port to the core. For example
>> VMDQ only supports one Physical port, other switch can support more
>> than one Physical port. In order to take care of that i have added two
>> ops in swithdev_ops:
>>
>> 1. vs_sched_rx_port:
>>
>> struct vswitch_port *vs_sched_rx_port(struct vswitch_dev *vs_dev, enum
>>                                       vswitch_port_type ptype,
>> uint16_t core_id)
>>
>> 2. vs_sched_tx_port:
>>
>> struct vswitch_port *vs_sched_tx_port(struct vswitch_dev *vs_dev, enum
>>                                       vswitch_port_type ptype, uint16_t
>> core_id)
>>
>> The idea of providing these functions is that vhost/main requests the
>> underlying switch implementation to schedule a port for rx or Tx, the
>> current running core_id is also passed as parameter. So the switch can
>> take a decision which port to do rx or tx based on core id, and may be
>> some other custom policy.
>>
>> For example VMDQ always returns the one single Physical port in
>> response to these functions called from any core. The other switch
>> can return the ports bound to the cores.
>>
>> Similarly there are two port operations (vs_port->get_rxq and
>> vs_port->get_txq), here also we pass core_id as parameter so that
>> the underlying switch implementation can schedule the rx or tx queue
>> based on the core_id.
>>
>> The above mentioned ops are used in drain_eth_rx() and
>> do_drain_mbuf_table() (main.c) currently, and they leave binding of
>> physical port and the queues to the underlying implementation. This
>> way we can accommodate VMDQ which uses only one physical port and
>> rxqueues based on VMDQ, OR any other switch which uses multiple
>> physical port and rx/tx queue scheduling based on some other logic.
>>
>> Please suggest if this way of scheduling ports and tx/rx queues is
>> fine or not?
>
> According to above explanation, in VMDQ switch, we cannot schedule two
> queues (belongs to the same port) on the same core, right?
>

Yes. This would be a limitation i believe which will not be acceptable.
The method that you suggested further in the email (tread port + 
queue_id as a vs_port) can solve this.

>
>>>
>>> (b) If no, we may bind each core with n switches? If so, switch_worker()
>>> also need to be reworked to keep receiving pkts from all ports of the
>>> switch, and dispatch.
>>>
>>> switch_worker() {
>>>     FOR each switch binding to this core {
>>>          FOR each port of switch {
>>>              rx(port, pkts[], count);
>>>              vs_lookup_n_fwd( information_needed );
>>>         }
>>>     }
>>> }
>> Since we currently support only one switch_dev in a running instance of
>> vhost_switch() we can use binding of ports to core as you suggested in
>> #(a).
>
> OK.
>
>>
>>>
>>> In all, (1) we'd better not use vdev to find phys dev in switch_worker
>>> any more;
>>
>> I agree with you. I have tried to do following in drain_eth_rx ():
>>
>>
>>         core_id = rte_lcore_id();
>>         rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
>>                                     core_id);
>>         if (unlikely(!rx_port))
>>             goto out;
>>
>>        rxq = rx_port->get_rxq(rx_port, vdev, core_id);
>>         rx_count = rx_port->do_rx(rx_port, rxq, NULL, pkts,
>> MAX_PKT_BURST);
>>
>> Here we don't assume any relation between vdev and the physical device
>> or rxq.  But pass vdev to the underlying switch so that it can choose
>> the rxq for this vdev. In case of VMDQ it will return VMDQ queue
>> number for this vdev. In case of any other switch implementation it
>> can have their own logic to decide rxq.
>
> Thanks for explanation.
>
>>
>> (2) we'd better not differentiate phys device and virtual
>>> device in generic framework (it's just an attribute of vswitch_port.
>>>
>>> What do you think?
>>
>> I agree with your thought that given the current API in this patchset we
>> should aim for making switch_worker agnostic of the port type. Ideally
>> it should look something like this:
>>
>> switch_worker() {
>>
>>         rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;
>>
>>         rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
>>         rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
>>         rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);
>
> Can we hide queues inside struct vswitch_port? I mean:
> For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far
> you've already stored "struct vhost_dev *" into vswitch_port.priv when
> it's a virtual port, how about store queue_id into vswitch_port.priv
> when it's a physical port.
> For arp_learning switch, make (port_id, all_enabled_queues) as a
> vswitch_port.
> Summarize above two: we treat (port_id, all_enabled_queues[]) as a
> vswitch_port.
>
> How about it?

Thanks for wonderful suggestion! Yes it should be possible, instead of 
having one vs_port for physical device we could create one vs_port for 
each phys device + queue_id combination.

Then we can bind the vs_ports to the cores, and do away with binding 
vdevs to the cores.

In order to add extra vs_ports (port + queue_id) there are two methods:

1. vhost/main.c (or vswitch_common.c) queries the underlying switch 
about how many rx queues to support thourgh a new accessor. VMDQ will 
return the number of vmdq queues, in this case. After getting the number 
of rx queues, vhost/main.c creates the extra ports i.e one vs_port for 
each physical port and queue id combination.

2. Second method is that the switch implementation for example vmdq.c 
create the extra ports (port + queue_id) as part of vs_port->add_port 
operation for the physical port.

Which method do you think we should follow? I think #1 should be done so 
that same logic can be used for other switches also.
>
>>
>>         vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);
>>
>> }
>>
>> Here i have two problems in achieving switch_worker as mentioned above:
>>
>> 1. while doing Rx from physical port, we need the associated vdev to
>> know the VMDQ rx_q. That was the reason i kept current logic of
>> keeping vdevs bound to the core in the switch_worker.  If we don't
>> keep list of vdevs assigned to the core, how we' will get the rxq on
>> phsyical device in case of VMDQ switch implementation.
>
> I think my proposal above can solve this problem.

Yes thanks!
>
>>
>> 2. drain_mbuf_table() currently assumes that there is only one
>> physical device on which we have to transmit. We may have to rethink
>> where to keep the tx queues (&lcore_tx_queue[core_id]), i case we have
>> multiple physical ports we might have to move tx queues to each port.
>
> This function, drain_mbuf_table(), and its related data structures, are
> used for cache on tx direction. But it's not only useful for physical
> port, but also virtual port (vhost_dev). I suggest to make such cache a
> field of per vswitch_port. And each again, all vswitch_ports are
> connected to a core, each core will periodically clean those vswitch_ports.
>

okay. So we add similar tx_q structures (one per core) to each 
vswitch_port, correct?

Thanks,
Pankaj




More information about the dev mailing list