[dpdk-dev] [RFC 2/2] drivers/net: use sleep delay by default for intel NICs

Wiles, Keith keith.wiles at intel.com
Mon Sep 3 18:04:34 CEST 2018



> On Sep 3, 2018, at 4:50 PM, Ilya Maximets <i.maximets at samsung.com> wrote:
> 
> On 03.09.2018 18:39, Wiles, Keith wrote:
>> 
>> 
>>> On Sep 3, 2018, at 4:33 PM, Ilya Maximets <i.maximets at samsung.com> wrote:
>>> 
>>> On 03.09.2018 18:14, Wiles, Keith wrote:
>>>> 
>>>> 
>>>>> On Aug 31, 2018, at 1:45 PM, Ilya Maximets <i.maximets at samsung.com> wrote:
>>>>> 
>>>>> NICs uses different delays up to a second during their
>>>>> configuration. It makes no sense to busy-wait so long wasting
>>>>> CPU cycles and preventing any other threads to execute on the
>>>>> same CPU core. These busy polling are the rudiments that came
>>>>> from the kernel drivers where you can not sleep in interrupt
>>>>> context, but as we're in userspace, we're able and should
>>>>> sleep to allow other threads to run.
>>>>> Delays never called on rx/tx path, so this should not affect
>>>>> performance.
>>>> 
>>>> I have a question, the only thread being effected by the busy wait is the thread assigned to the core or the master lcore in the case of rte_eal_init(). It should not effect other threads running on other cores, right? If you do have other threads running in the same code then I see the need, please help me understand the use and how it is effecting DPDK.
>>> 
>>> Originally, this patch comes from this issue:
>>>  http://mails.dpdk.org/archives/dev/2018-August/110640.html
>>> 
>>> non-EAL threads is one of the answers.
>>> For example, main thread in OVS periodically checks the link statuses
>>> and hangs there busy waiting on different NIC configuration operations.
>> 
>> We have link state get no wait why is that not working instead of wait polling the line state?
> 
> Yes, and OVS uses it, but ixgbe driver contains the workaround for a
> fiber links configuration issue that calls ixgbe_setup_link, that
> busy waits trying to configure multispeed fiber. You may found
> details in the patch I mentioned. I moved this code to the separate
> alarm thread, but it will eat CPU cycles anyway preventing others
> from working.

OK. I see.

I think using nanosleep is ok as long as that works on all of the platforms FreeBSD, Linux, Windows will be coming I assume later. This should not be a show stopper unless it does not work on FreeBSD, but I am pretty it does as I do not have access to look.

I will look at the rest of the patch.

> 
>> 
>>> Also, main OVS tread is responsible for port configuration and
>>> re-configuration. There are few other in-dpdk threads like interrupts
>>> handling thread (alarm handling thread). vhost_thread is working on
>>> the same lcores and wants some time to work too.
>>> 
>>> In case of hyperthreading busy-waiting will significantly affect
>>> threads on the siblings.
>>> 
>>> Best regards, Ilya Maximets.
>>> 
>>>>> 
>>>>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>>>>> ---
>>>>> drivers/net/avf/Makefile             | 1 +
>>>>> drivers/net/avf/base/avf_osdep.h     | 4 ++--
>>>>> drivers/net/e1000/Makefile           | 1 +
>>>>> drivers/net/e1000/base/e1000_osdep.h | 2 +-
>>>>> drivers/net/i40e/base/i40e_osdep.h   | 6 +++---
>>>>> drivers/net/ifc/base/ifcvf_osdep.h   | 2 +-
>>>>> drivers/net/ixgbe/base/ixgbe_osdep.h | 2 +-
>>>>> 7 files changed, 10 insertions(+), 8 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/net/avf/Makefile b/drivers/net/avf/Makefile
>>>>> index 3f815bbc4..8ee707529 100644
>>>>> --- a/drivers/net/avf/Makefile
>>>>> +++ b/drivers/net/avf/Makefile
>>>>> @@ -9,6 +9,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
>>>>> LIB = librte_pmd_avf.a
>>>>> 
>>>>> CFLAGS += -O3
>>>>> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>>>>> LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
>>>>> LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash
>>>>> LDLIBS += -lrte_bus_pci
>>>>> diff --git a/drivers/net/avf/base/avf_osdep.h b/drivers/net/avf/base/avf_osdep.h
>>>>> index 9ef45968e..442a5acd0 100644
>>>>> --- a/drivers/net/avf/base/avf_osdep.h
>>>>> +++ b/drivers/net/avf/base/avf_osdep.h
>>>>> @@ -93,8 +93,8 @@ typedef uint64_t        u64;
>>>>> #define avf_memset(a, b, c, d) memset((a), (b), (c))
>>>>> #define avf_memcpy(a, b, c, d) rte_memcpy((a), (b), (c))
>>>>> 
>>>>> -#define avf_usec_delay(x) rte_delay_us(x)
>>>>> -#define avf_msec_delay(x) rte_delay_us(1000*(x))
>>>>> +#define avf_usec_delay(x) rte_delay_us_sleep(x)
>>>>> +#define avf_msec_delay(x) avf_usec_delay(1000 * (x))
>>>>> 
>>>>> #define AVF_PCI_REG(reg)		rte_read32(reg)
>>>>> #define AVF_PCI_REG_ADDR(a, reg) \
>>>>> diff --git a/drivers/net/e1000/Makefile b/drivers/net/e1000/Makefile
>>>>> index 9c87e883b..0ed627656 100644
>>>>> --- a/drivers/net/e1000/Makefile
>>>>> +++ b/drivers/net/e1000/Makefile
>>>>> @@ -10,6 +10,7 @@ LIB = librte_pmd_e1000.a
>>>>> 
>>>>> CFLAGS += -O3
>>>>> CFLAGS += $(WERROR_FLAGS)
>>>>> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>>>>> LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
>>>>> LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
>>>>> LDLIBS += -lrte_bus_pci
>>>>> diff --git a/drivers/net/e1000/base/e1000_osdep.h b/drivers/net/e1000/base/e1000_osdep.h
>>>>> index b8868049f..5958ea157 100644
>>>>> --- a/drivers/net/e1000/base/e1000_osdep.h
>>>>> +++ b/drivers/net/e1000/base/e1000_osdep.h
>>>>> @@ -48,7 +48,7 @@
>>>>> 
>>>>> #include "../e1000_logs.h"
>>>>> 
>>>>> -#define DELAY(x) rte_delay_us(x)
>>>>> +#define DELAY(x) rte_delay_us_sleep(x)
>>>>> #define usec_delay(x) DELAY(x)
>>>>> #define usec_delay_irq(x) DELAY(x)
>>>>> #define msec_delay(x) DELAY(1000*(x))
>>>>> diff --git a/drivers/net/i40e/base/i40e_osdep.h b/drivers/net/i40e/base/i40e_osdep.h
>>>>> index 8e5c593c9..a6072e153 100644
>>>>> --- a/drivers/net/i40e/base/i40e_osdep.h
>>>>> +++ b/drivers/net/i40e/base/i40e_osdep.h
>>>>> @@ -233,9 +233,9 @@ struct i40e_spinlock {
>>>>> #define i40e_memcpy(a, b, c, d) rte_memcpy((a), (b), (c))
>>>>> 
>>>>> #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
>>>>> -#define DELAY(x) rte_delay_us(x)
>>>>> -#define i40e_usec_delay(x) rte_delay_us(x)
>>>>> -#define i40e_msec_delay(x) rte_delay_us(1000*(x))
>>>>> +#define DELAY(x) rte_delay_us_sleep(x)
>>>>> +#define i40e_usec_delay(x) DELAY(x)
>>>>> +#define i40e_msec_delay(x) DELAY(1000 * (x))
>>>>> #define udelay(x) DELAY(x)
>>>>> #define msleep(x) DELAY(1000*(x))
>>>>> #define usleep_range(min, max) msleep(DIV_ROUND_UP(min, 1000))
>>>>> diff --git a/drivers/net/ifc/base/ifcvf_osdep.h b/drivers/net/ifc/base/ifcvf_osdep.h
>>>>> index cf151ef52..6aef25ea4 100644
>>>>> --- a/drivers/net/ifc/base/ifcvf_osdep.h
>>>>> +++ b/drivers/net/ifc/base/ifcvf_osdep.h
>>>>> @@ -17,7 +17,7 @@
>>>>> #define DEBUGOUT(S, args...)    RTE_LOG(DEBUG, PMD, S, ##args)
>>>>> #define STATIC                  static
>>>>> 
>>>>> -#define msec_delay	rte_delay_ms
>>>>> +#define msec_delay(x)	rte_delay_us_sleep(1000 * (x))
>>>>> 
>>>>> #define IFCVF_READ_REG8(reg)		rte_read8(reg)
>>>>> #define IFCVF_WRITE_REG8(val, reg)	rte_write8((val), (reg))
>>>>> diff --git a/drivers/net/ixgbe/base/ixgbe_osdep.h b/drivers/net/ixgbe/base/ixgbe_osdep.h
>>>>> index bb5dfd2af..94ede9bc2 100644
>>>>> --- a/drivers/net/ixgbe/base/ixgbe_osdep.h
>>>>> +++ b/drivers/net/ixgbe/base/ixgbe_osdep.h
>>>>> @@ -51,7 +51,7 @@
>>>>> 
>>>>> #define ASSERT(x) if(!(x)) rte_panic("IXGBE: x")
>>>>> 
>>>>> -#define DELAY(x) rte_delay_us(x)
>>>>> +#define DELAY(x) rte_delay_us_sleep(x)
>>>>> #define usec_delay(x) DELAY(x)
>>>>> #define msec_delay(x) DELAY(1000*(x))
>>>>> 
>>>>> -- 
>>>>> 2.17.1
>>>>> 
>>>> 
>>>> Regards,
>>>> Keith
>> 
>> Regards,
>> Keith

Regards,
Keith



More information about the dev mailing list