[dpdk-dev] [PATCH] net/ena: Fix admin cq polling for 32-bit apps
David Harton (dharton)
dharton at cisco.com
Mon Jul 1 14:00:57 CEST 2019
> -----Original Message-----
> From: Michał Krawczyk <mk at semihalf.com>
> Sent: Monday, July 01, 2019 3:24 AM
> To: David Harton (dharton) <dharton at cisco.com>
> Cc: dev at dpdk.org; Marcin Wojtas <mw at semihalf.com>; Tzalik, Guy
> <gtzalik at amazon.com>; Schmeilin, Evgeny <evgenys at amazon.com>; Belgazal,
> Netanel <netanel at amazon.com>; Kiyanovski, Arthur <akiyano at amazon.com>;
> Chauskin, Igor <igorch at amazon.com>; Matushevsky, Alexander
> <matua at amazon.com>; sameehj at amazon.com
> Subject: Re: [PATCH] net/ena: Fix admin cq polling for 32-bit apps
>
> + folks responsible for ENA on other platforms as this code touches
> every ENA target
>
> pt., 28 cze 2019 o 17:46 David Harton (dharton) <dharton at cisco.com>
> napisał(a):
> >
> >
> >
> > > -----Original Message-----
> > > From: Michał Krawczyk <mk at semihalf.com>
> > > Sent: Friday, June 28, 2019 11:03 AM
> > > To: David Harton (dharton) <dharton at cisco.com>
> > > Cc: dev at dpdk.org; Marcin Wojtas <mw at semihalf.com>; Tzalik, Guy
> > > <gtzalik at amazon.com>; Schmeilin, Evgeny <evgenys at amazon.com>
> > > Subject: Re: [PATCH] net/ena: Fix admin cq polling for 32-bit apps
> > >
> > > Hi,
> > >
> > > sorry for the late reply.
> > >
> > > śr., 29 maj 2019 o 23:01 David Harton <dharton at cisco.com> napisał(a):
> > > >
> > > > Recent modifications to admin command queue polling logic did not
> > > > support 32-bit applications. Updated the driver to work for 32 or
> > > > 64 bit applications as well as avoiding roll-over possibility.
> > > >
> > > > Fixes: 3adcba9a89 ("net/ena: update HAL to the newer version")
> > > >
> > > > Signed-off-by: David Harton <dharton at cisco.com>
> > > > ---
> > > > drivers/net/ena/base/ena_com.c | 10 +++++++---
> > > > drivers/net/ena/base/ena_plat_dpdk.h | 6 +-----
> > > > 2 files changed, 8 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ena/base/ena_com.c
> > > > b/drivers/net/ena/base/ena_com.c index b688067f7..b96adde3c 100644
> > > > --- a/drivers/net/ena/base/ena_com.c
> > > > +++ b/drivers/net/ena/base/ena_com.c
> > > > @@ -547,10 +547,13 @@ static int
> > > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx
> > > *comp_c
> > > > struct
> > > > ena_com_admin_queue *admin_queue) {
> > > > unsigned long flags = 0;
> > > > - unsigned long timeout;
> > > > + u32 timeout_ms;
> > > > int ret;
> > > >
> > > > - timeout = ENA_GET_SYSTEM_TIMEOUT(admin_queue-
> > > >completion_timeout);
> > > > + /* Calculate ms granularity timeout from us
> completion_timeout
> > > > + * making sure we retry once if we have at least 1ms
> > > > + */
> > > > + timeout_ms = (admin_queue->completion_timeout / 1000) +
> > > > + (ENA_POLL_MS - 1);
> > > >
> > > > while (1) {
> > > > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); @@
> > > > -560,7 +563,7 @@ static int
> > > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx
> > > *comp_c
> > > > if (comp_ctx->status != ENA_CMD_SUBMITTED)
> > > > break;
> > > >
> > > > - if (ENA_TIME_EXPIRE(timeout)) {
> > > > + if (timeout_ms < ENA_POLL_MS) {
> > > > ena_trc_err("Wait for completion (polling)
> > > timeout\n");
> > > > /* ENA didn't have any completion */
> > > > ENA_SPINLOCK_LOCK(admin_queue->q_lock,
> > > > flags); @@ -573,6 +576,7 @@ static int
> > > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx
> > > *comp_c
> > > > }
> > > >
> > > > ENA_MSLEEP(ENA_POLL_MS);
> > > > + timeout_ms -= ENA_POLL_MS;
> > >
> > > This part can be problematic at the very overloaded systems - in
> > > that case the ENA_MSLEEP can take a much longer than ENA_POLL_MS and
> > > in this situation the time spent in this function can't be determined.
> > > That's why we were checking time spent in sleep every
> > > ENA_TIME_EXPIRE macro.
> > > The issue can be observed especially in the kernel drivers, and
> > > ena_com is common file for all ENA drivers.
> >
> > I don't understand the comment/concern.
> >
> > The previous macros calculate the future cycle count based on a us
> timeout value (assuming 64 bit apps) and repeat the loop until the command
> is "submitted" or the current cycle count is greater than the calculated
> cycle count value sleeping ENA_POLL_MS between each iteration.
> >
> >
> > The new method accomplishes the same thing but instead of using a "cycle
> count" it uses the number of ms which the poll and sleep actions are based
> upon.
> >
> > The differences with the new method are:
> > - it uses less instructions
> > - not susceptible to cycle count overrun (admittedly highyl unlikely)
> > - (most importantly) works equally well for 32 or 64 bit apps
> >
> > Can you elaborate on your concern?
>
> The problem with this solution is that you are assuming that ENA_MSLEEP
> will always sleep for ENA_POLL_MS which is not true. It can sleep much
> more in busy systems.
> The behavior of this function before your changes is minimizing that time
> by getting current cycles in the ENA_TIME_EXPIRE. In the above solution,
> we can not determine how much time we've sleepped. It could be ENA_POLL_MS
> or even 10 second.
Thanks, I understand your concern now. It's true that what I added is much more coarse than what was there because any call to rte_delay_ms() is only guaranteed to wait at lease those ms and can wait longer.
Kinda scary to think though I ask to wait say 10ms and the context be switched out for 10s. Not really a high-performing or low latency system in that case and if so I'm not sure I see the harm waiting for the coarse amount of time either.
However, if new approach is truly not desired then the original approach can preserved if it uses uint64_t to track clock cycles instead of the architecture dependent type currently used.
Regards,
Dave
>
> Thanks,
> Michal
>
> > Thanks,
> > Dave
> >
More information about the dev
mailing list