[dpdk-dev] [RFC] vhost: new rte_vhost API proposal

Stojaczyk, DariuszX dariuszx.stojaczyk at intel.com
Fri May 11 07:55:45 CEST 2018


Hi,

> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha at redhat.com]
> Sent: Friday, May 11, 2018 12:37 AM
> On Thu, May 10, 2018 at 03:22:53PM +0200, Dariusz Stojaczyk wrote:
> > rte_virtio would offer both vhost and virtio driver APIs. These two
> > have a lot of common code for vhost-user handling or PCI access for
> > initiator/virtio-vhost-user (and possibly vDPA) so there's little
> > sense to keep target and initiator code separated between different
> > libs. Of course, the APIs would be separate - only some parts of
> > the code would be shared.
> 
> The API below seems to be for vhost backends (aka slaves).  rte_virtio_*
> is a misnomer because vhost and virtio are two different things.  This
> is not for implementing virtio devices, it's specifically for vhost
> devices.

I agree it's named a bit off if we're talking about vhost. My idea was to introduce a generic library for userspace Virtio processing and that's where the name came from. Even when you use just the vhost API that's introduced here, you are only required to implement vring processing, config access, and possibly interrupt handling, all of which are typical Virtio things. The vhost logic is hidden inside.

> 
> Vhost does not offer the full virtio device model - otherwise it would
> be just another transport in the VIRTIO specification.  Instead vhost is
> a protocol for vhost devices, which are subsets of virtio devices.
> 
> I suggest calling it rte_vhost2 since it's basically a new, incompatible
> rte_vhost API.

Rte_vhost2 sounds better for what we have now, but would that name be still valid once we add a true Virtio driver functionality? (I believe it's called Virtio PMD in DPDK right now). That driver would reuse a lot of the vhost code for PCI and vhost-user, so it makes some sense to put these two together. 

I don't think rte_vhost2 is a permanent name anyway, so maybe we could call it like so for now, and rename it later once I introduce that additional Virtio functionality? Would that work?

> 
> Also, the initiator/target terminology does not match the vhost-user
> specification.  It uses master/client and slave/backend/server.  Adding
> another pair of words makes things more confusing.  Please stick to the
> words used by the spec.

Ack.

> 
> > +/**
> > + * Device/queue related callbacks, all optional. Provided callback
> > + * parameters are guaranteed not to be NULL until explicitly specified.
> 
> s/until/unless/ ?

Ack.

> > + /**
> > + * Stop processing vq. It shouldn't be accessed after this callback
> > + * completes (via tgt_cb_complete). This can be called prior to
> shutdown
> 
> s/tgt_cb_complete/rte_virtio_tgt_cb_complete/

Ack.

> 
> > + * or before actions that require changing vhost device/vq state.
> > + */
> > + void (*queue_stop)(struct rte_virtio_dev *vdev, struct rte_virtio_vq
> *vq);
> > + /** Device disconnected. All queues are guaranteed to be stopped by
> now */
> > + void (*device_destroy)(struct rte_virtio_dev *vdev);
> > + /**
> > + * Custom message handler. `vdev` and `vq` can be NULL. This is called
> > + * for backend-specific actions. The `id` should be prefixed by the
> 
> Since vdev can be NULL, does this mean custom_msg() may be invoked at
> any time during the lifecycle and even before/after
> device_create()/device_destroy()?

Theoretically. I was thinking of some poorly-written backends notifying they're out of internal resources, but I agree it's just poor. I'll remove the `vdev can be NULL` part.

> > + */
> > + void (*custom_msg)(struct rte_virtio_dev *vdev, struct rte_virtio_vq
> *vq,
> > +   char *id, void *ctx);
> 
> What is the purpose of id and why is it char* instead of const char*?

Ack, It should be const. (same thing goes to every other char* in this patch)

For example vhost-crypto introduces two new vhost-user messages for initializing and destroying crypto session. The underlying vhost-crypto vhost-user backend after receiving such message could execute this callback as follows:

struct my_crypto_data *data = calloc();
[...]
Ops->custom_msg(vdev, NULL, "crypto_sess_init", data);

> 
> Is ctx the "message"?  If ctx is untrusted message data from an external
> process, why is there no size argument?  Who validates the message size?

Ack. Will add size parameter.

> 
> > +
> > + /**
> > + * Interrupt handler, synchronous. If this callback is set to NULL,
> > + * rte_virtio will hint the initiators not to send any interrupts.
> > + */
> > + void (*queue_kick)(struct rte_virtio_dev *vdev, struct rte_virtio_vq
> *vq);
> 
> Devices often have multiple types of queues.  Some of them may be
> suitable for polling, others may be suitable for an interrupt-driven
> model.  Is there a way to enable/disable interrupts for specific queues?

Thanks, I didn't think of that. I'll need to move the responsibility of setting vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT to the user.

> 
> > + /** Device config read, synchronous. */
> 
> What is the meaning of the return value?

How about the following:
\return 0 if `config` has been successfully set, -1 otherwise.

An error ( -1 ) is propagated all the way to the master so he can handle it his way.

> 
> > + int (*get_config)(struct rte_virtio_dev *vdev, uint8_t *config,
> > +  uint32_t config_len);
> > + /** Device config changed by the driver, synchronous. */
> 
> What is the meaning of the return value?
> 
> What is the meaning of the flags?

Good call. I actually can't find any doc/usage of this API.
Changpeng (the original get/set_config author, +CC'ed), could you document this function briefly here?

> 
> > + int (*set_config)(struct rte_virtio_dev *vdev, uint8_t *config,
> > +  uint32_t offset, uint32_t len, uint32_t flags);
> > +};
> > +
> > +/**
> > + * Registers a new vhost target accepting remote connections. Multiple
> > + * available transports are available. It is possible to create a Vhost-
> user
> > + * Unix domain socket polling local connections or connect to a
> physical
> > + * Virtio device and install an interrupt handler .
> > + * \param trtype type of the transport used, e.g. "PCI", "PCI-vhost-
> user",
> > + * "PCI-vDPA", "vhost-user".
> > + * \param trid identifier of the device. For PCI this would be the BDF
> address,
> > + * for vhost-user the socket name.
> > + * \param trctx additional data for the specified transport. Can be
> NULL.
> > + * \param tgt_ops callbacks to be called upon reaching specific
> initialization
> > + * states.
> > + * \param features supported Virtio features. To be negotiated with
> the
> > + * driver ones. rte_virtio will append a couple of generic feature bits
> > + * which are required by the Virtio spec. TODO list these features here
> > + * \return 0 on success, negative errno otherwise
> > + */
> > +int rte_virtio_tgt_register(char *trtype, char *trid, void *trctx,
> > +   struct rte_virtio_tgt_ops *tgt_ops,
> > +   uint64_t features);
> > +
> > +/**
> > + * Finish async device tgt ops callback. Unless a tgt op has been
> documented
> > + * as 'synchronous' this function must be called at the end of the op
> handler.
> > + * It can be called either before or after the op handler returns.
> rte_virtio
> > + * won't call any callbacks while another one hasn't been finished yet.
> > + * \param vdev vhost device
> > + * \param rc 0 on success, negative errno otherwise.
> > + */
> > +int rte_virtio_tgt_cb_complete(struct rte_virtio_dev *vdev, int rc);
> 
> How can this function fail and how is the caller supposed to handle
> failure?

If -1 is returned, the current callback will be perceived as failed. So if ` device_create` is completed with rc != 0, this lib will teardown the device and no subsequent `device_destroy` will be called. Similar thing goes with queues - if a queue failed to start, it won't need to be stopped. Since you pointed it out - I'll mention it somewhere in the doc. I didn't do so in the first place because it's analogous to how rte_vhost works now.

> 
> Are there any thread-safety rules regarding this API?  Can it be called
> from a different thread than the callback?

Yes. I should mention it.

> 
> > +
> > +/**
> > + * Unregisters a vhost target asynchronously.
> 
> How are existing device instances affected?

Ack. How about:
All active queues will be stopped and all devices destroyed.

This is analogous to what rte_vhost has now.

> 
> > + * \param cb_fn callback to be called on finish
> > + * \param cb_arg argument for \c cb_fn
> > + */
> > +void rte_virtio_tgt_unregister(char *trid,
> 
> One of the rte_vhost API limitations is that the ID namespace is shared
> between transports.  The same seems to be the case here.
> 
> It assumes that "PCI", "PCI-vhost-user", "PCI-vDPA", and "vhost-user"
> are never instantiated with the same trid.  UNIX domain sockets can have
> arbitrary filenames (that resemble a PCI BDF).  And who knows what
> other
> transports will be added in the future.
> 
> I think namespace collisions could be a problem.

Ack, I'll add `char *trtype` param to the unregister func.

> 
> > +      void (*cb_fn)(void *arg), void *cb_arg);
> 
> Which thread is the callback invoked from?

It'll be called from the same thread that calls rte_virtio_tgt_ops. I'll mention it in the doc, thanks.

rte_virtio_tgt_unregister should also return an error code for cases where a device with given trtype/trid couldn't be found. It'll prevent some implementations from waiting endlessly for cb_fn to be called.

Thanks,
D.


More information about the dev mailing list