[dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration

Maxime Coquelin maxime.coquelin at redhat.com
Mon Dec 17 12:08:44 CET 2018



On 12/17/18 10:12 AM, Wang, Xiao W wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
>> Sent: Sunday, December 16, 2018 1:35 AM
>> To: Wang, Xiao W <xiao.w.wang at intel.com>; Bie, Tiwei <tiwei.bie at intel.com>
>> Cc: alejandro.lucero at netronome.com; dev at dpdk.org; Wang, Zhihong
>> <zhihong.wang at intel.com>; Ye, Xiaolong <xiaolong.ye at intel.com>
>> Subject: Re: [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration
>>
>>
>>
>> On 12/14/18 10:16 PM, Xiao Wang wrote:
>>> In SW assisted live migration mode, driver will stop the device and
>>> setup a mediate virtio ring to relay the communication between the
>>> virtio driver and the VDPA device.
>>>
>>> This data path intervention will allow SW to help on guest dirty page
>>> logging for live migration.
>>>
>>> This SW fallback is event driven relay thread, so when the network
>>> throughput is low, this SW fallback will take little CPU resource, but
>>> when the throughput goes up, the relay thread's CPU usage will goes up
>>> accordinly.
>>
>> s/accordinly/accordingly/
>>
> 
> Will fix it in next version.
> 
>>>
>>> User needs to take all the factors including CPU usage, guest perf
>>> degradation, etc. into consideration when selecting the live migration
>>> support mode.
>>>
>>> Signed-off-by: Xiao Wang <xiao.w.wang at intel.com>
>>> ---
>>>    drivers/net/ifc/base/ifcvf.h |   1 +
>>>    drivers/net/ifc/ifcvf_vdpa.c | 346
>> ++++++++++++++++++++++++++++++++++++++++++-
>>>    2 files changed, 344 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
>>> index c15c69107..e8a30d2c6 100644
>>> --- a/drivers/net/ifc/base/ifcvf.h
>>> +++ b/drivers/net/ifc/base/ifcvf.h
>>> @@ -50,6 +50,7 @@
>>>    #define IFCVF_LM_ENABLE_VF		0x1
>>>    #define IFCVF_LM_ENABLE_PF		0x3
>>>    #define IFCVF_LOG_BASE			0x100000000000
>>> +#define IFCVF_MEDIATE_VRING		0x200000000000
>>
>> MEDIATED?
> 
> "mediate" is used as adjective here.
> 
>>
>>>
>>>    #define IFCVF_32_BIT_MASK		0xffffffff
>>>
>>> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
>>> index f181c5a6e..61757d0b4 100644
>>> --- a/drivers/net/ifc/ifcvf_vdpa.c
>>> +++ b/drivers/net/ifc/ifcvf_vdpa.c
>>> @@ -63,6 +63,9 @@ struct ifcvf_internal {
>>>    	rte_atomic32_t running;
>>>    	rte_spinlock_t lock;
>>>    	bool sw_lm;
> 
> [...]
> 
>>> +static void *
>>> +vring_relay(void *arg)
>>> +{
>>> +	int i, vid, epfd, fd, nfds;
>>> +	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
>>> +	struct rte_vhost_vring vring;
>>> +	struct rte_intr_handle *intr_handle;
>>> +	uint16_t qid, q_num;
>>> +	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
>>> +	struct epoll_event ev;
>>> +	int nbytes;
>>> +	uint64_t buf;
>>> +
>>> +	vid = internal->vid;
>>> +	q_num = rte_vhost_get_vring_num(vid);
>>> +	/* prepare the mediate vring */
>>> +	for (qid = 0; qid < q_num; qid++) {
>>> +		rte_vhost_get_vring_base(vid, qid,
>>> +				&internal->m_vring[qid].avail->idx,
>>> +				&internal->m_vring[qid].used->idx);
>>> +		rte_vdpa_relay_vring_avail(vid, qid, &internal->m_vring[qid]);
>>> +	}
>>> +
>>> +	/* add notify fd and interrupt fd to epoll */
>>> +	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
>>> +	if (epfd < 0) {
>>> +		DRV_LOG(ERR, "failed to create epoll instance.");
>>> +		return NULL;
>>> +	}
>>> +	internal->epfd = epfd;
>>> +
>>> +	for (qid = 0; qid < q_num; qid++) {
>>> +		ev.events = EPOLLIN | EPOLLPRI;
>>> +		rte_vhost_get_vhost_vring(vid, qid, &vring);
>>> +		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
>>> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
>>> +			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
>>> +			return NULL;
>>> +		}
>>> +	}
>>> +
>>> +	intr_handle = &internal->pdev->intr_handle;
>>> +	for (qid = 0; qid < q_num; qid++) {
>>> +		ev.events = EPOLLIN | EPOLLPRI;
>>> +		ev.data.u64 = 1 | qid << 1 |
>>> +			(uint64_t)intr_handle->efds[qid] << 32;
>>> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid],
>> &ev)
>>> +				< 0) {
>>> +			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
>>> +			return NULL;
>>> +		}
>>> +	}
>>> +
>>> +	/* start relay with a first kick */
>>> +	for (qid = 0; qid < q_num; qid++)
>>> +		ifcvf_notify_queue(&internal->hw, qid);
>>> +
>>> +	/* listen to the events and react accordingly */
>>> +	for (;;) {
>>> +		nfds = epoll_wait(epfd, events, q_num * 2, -1);
>>> +		if (nfds < 0) {
>>> +			if (errno == EINTR)
>>> +				continue;
>>> +			DRV_LOG(ERR, "epoll_wait return fail\n");
>>> +			return NULL;
>>> +		}
>>> +
>>> +		for (i = 0; i < nfds; i++) {
>>> +			fd = (uint32_t)(events[i].data.u64 >> 32);
>>> +			do {
>>> +				nbytes = read(fd, &buf, 8);
>>> +				if (nbytes < 0) {
>>> +					if (errno == EINTR ||
>>> +					    errno == EWOULDBLOCK ||
>>> +					    errno == EAGAIN)
>>> +						continue;
>>> +					DRV_LOG(INFO, "Error reading "
>>> +						"kickfd: %s",
>>> +						strerror(errno));
>>> +				}
>>> +				break;
>>> +			} while (1);
>>> +
>>> +			qid = events[i].data.u32 >> 1;
>>> +
>>> +			if (events[i].data.u32 & 1)
>>> +				update_used_ring(internal, qid);
>>> +			else
>>> +				update_avail_ring(internal, qid);
>>> +		}
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static int
>>> +setup_vring_relay(struct ifcvf_internal *internal)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = pthread_create(&internal->tid, NULL, vring_relay,
>>> +			(void *)internal);
>>
>> So it will be scheduled without any affinity?
>> Shouldn't it use a pmd thread instead?
> 
> The new thread will inherit the thread affinity from its parent thread. As you know, vdpa is trying to
>   minimize CPU usage for virtio HW acceleration, and we assign just one core to vdpa daemon
>   (doc/guides/sample_app_ug/vdpa.rst), so there's no dedicated pmd worker core.

OK, thanks for the clarification!

> Thanks,
> Xiao
> 


More information about the dev mailing list