[dpdk-dev] Admin Queue ENA

kumaraparameshwaran rathinavel kumaraparamesh92 at gmail.com
Fri Nov 8 07:02:26 CET 2019


Hi Michał,

Please look at the below function,

static int
ena_com_wait_and_process_admin_cq_polling(
        struct ena_comp_ctx *comp_ctx,
        struct ena_com_admin_queue *admin_queue)
{
    unsigned long flags = 0;
    u64 start_time;
    int ret;

    start_time = ENA_GET_SYSTEM_USECS();

    while (comp_ctx->status == ENA_CMD_SUBMITTED) {
        if ((ENA_GET_SYSTEM_USECS() - start_time) >
            ADMIN_CMD_TIMEOUT_US) {
            ena_trc_err("Wait for completion (polling) timeout\n");
            /* ENA didn't have any completion */
            ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
            admin_queue->stats.no_completion++;
            admin_queue->running_state = false;
            ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);

            ret = ENA_COM_TIMER_EXPIRED;
            goto err;
        }



*ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
ena_com_handle_admin_completion(admin_queue);
ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);*
    }

    if (unlikely(comp_ctx->status == ENA_CMD_ABORTED)) {
        ena_trc_err("Command was aborted\n");
        ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
        admin_queue->stats.aborted_cmd++;
        ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
        ret = ENA_COM_NO_DEVICE;
        goto err;
    }

    ENA_ASSERT(comp_ctx->status == ENA_CMD_COMPLETED,
           "Invalid comp status %d\n", comp_ctx->status);

    ret = ena_com_comp_status_to_errno(comp_ctx->comp_status);
err:
    *comp_ctxt_release(admin_queue, comp_ctx);*
    return ret;
}

This is a case where there are two threads executing admin commands.

The occupied flag is set to false in the function comp_ctxt_release.  Let
us say there are two consumers of completion context and C1 has a
completion context and the same completion context can be used by another
consumer C2 even before the C1 is resetting the occupied flag.

This is because the ena_com_handle_admin_completion is done under spin lock
and comp_ctxt_release is not under this spin lock.

Thanks,
Param

On Thu, Oct 24, 2019 at 2:09 PM Michał Krawczyk <mk at semihalf.com> wrote:

> sob., 19 paź 2019 o 20:26 kumaraparameshwaran rathinavel
> <kumaraparamesh92 at gmail.com> napisał(a):
> >
> > Hi All,
> >
> > In the ENA poll mode driver I see that every request in the admin queue
> is
> > associated with a completion context and this is preallocated during the
> > device initialisation. When the completion context is used we check for
> > occupied to be true in the 16.X version if the occupied flag is set to
> true
> > we assert and in the latest version I see that this is an error log. But
> > there is a time window where if the completion context would be available
> > to the other consumer but still the old consumer did not set the occupied
> > to false. The new consumer holds the admin queue lock to get the
> completion
> > context but the update by the old consumer to set the the occupied flag
> is
> > not done under lock. So should we make sure that the new consumer should
> > get the completion context only when the occupied flag is set to false.
> Any
> > thoughts on this?
>
> Hi Param,
>
> Both the producer and the consumer are holding the spinlock while
> getting the completion context. If you see any situation where it
> isn't (besides the release function), please let me know.
> As it is protected by the lock, returning error while completion
> context is occupied (and it shouldn't) it fine, as it will stop the
> admin queue and allow the DPDK user application to execute the reset
> of the device.
>
> Thanks,
> Michal
>
> > If required I can try to make a patch where the completion context would
> be
> > available only after setting the occupied flag to false.
> >
> > Thanks,
> > Param.
>


More information about the dev mailing list