[dpdk-dev] [PATCH v3 2/2] mempool: pktmbuf pool default fallback for mempool ops error

Olivier Matz olivier.matz at 6wind.com
Tue Nov 22 10:24:46 CET 2016


Hi Hemant,

Back on this topic, please see some comments below.

On 11/07/2016 01:30 PM, Hemant Agrawal wrote:
> Hi Olivier,
> 	
>> -----Original Message-----
>> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
>> Sent: Friday, October 14, 2016 5:41 PM
>>> On 9/22/2016 6:42 PM, Hemant Agrawal wrote:
>>>> Hi Olivier
>>>>
>>>> On 9/19/2016 7:27 PM, Olivier Matz wrote:
>>>>> Hi Hemant,
>>>>>
>>>>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
>>>>>> In the rte_pktmbuf_pool_create, if the default external mempool is
>>>>>> not available, the implementation can default to "ring_mp_mc",
>>>>>> which is an software implementation.
>>>>>>
>>>>>> Signed-off-by: Hemant Agrawal <hemant.agrawal at nxp.com>
>>>>>> ---
>>>>>> Changes in V3:
>>>>>> * adding warning message to say that falling back to default sw
>>>>>> pool
>>>>>> ---
>>>>>>  lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
>>>>>>  1 file changed, 8 insertions(+)
>>>>>>
>>>>>> diff --git a/lib/librte_mbuf/rte_mbuf.c
>>>>>> b/lib/librte_mbuf/rte_mbuf.c index 4846b89..8ab0eb1 100644
>>>>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>>>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>>>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name,
>>>>>> unsigned n,
>>>>>>
>>>>>>      rte_errno = rte_mempool_set_ops_byname(mp,
>>>>>>              RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
>>>>>> +
>>>>>> +    /* on error, try falling back to the software based default
>>>>>> pool */
>>>>>> +    if (rte_errno == -EOPNOTSUPP) {
>>>>>> +        RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
>>>>>> +            "falling back to sw mempool \"ring_mp_mc\"");
>>>>>> +        rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc",
>>>>>> NULL);
>>>>>> +    }
>>>>>> +
>>>>>>      if (rte_errno != 0) {
>>>>>>          RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>>>>>>          return NULL;
>>>>>>
>>>>>
>>>>> Without adding a new method ".supported()", the first call to
>>>>> rte_mempool_populate() could return the same error ENOTSUP. In this
>>>>> case, it is still possible to fallback.
>>>>>
>>>> It will be bit late.
>>>>
>>>> On failure, than we have to set the default ops and do a goto before
>>>> rte_pktmbuf_pool_init(mp, &mbp_priv);
>>
>> I still think we can do the job without adding the .supported() method.
>> The following code is just an (untested) example:
>>
>> struct rte_mempool *
>> rte_pktmbuf_pool_create(const char *name, unsigned n,
>>     unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
>>     int socket_id)
>> {
>>     struct rte_mempool *mp;
>>     struct rte_pktmbuf_pool_private mbp_priv;
>>     unsigned elt_size;
>>     int ret;
>>     const char *ops[] = {
>>         RTE_MBUF_DEFAULT_MEMPOOL_OPS, "ring_mp_mc", NULL,
>>     };
>>     const char **op;
>>
>>     if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) {
>>         RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
>>             priv_size);
>>         rte_errno = EINVAL;
>>         return NULL;
>>     }
>>     elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
>>         (unsigned)data_room_size;
>>     mbp_priv.mbuf_data_room_size = data_room_size;
>>     mbp_priv.mbuf_priv_size = priv_size;
>>
>>     for (op = &ops[0]; *op != NULL; op++) {
>>         mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
>>             sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
>>         if (mp == NULL)
>>             return NULL;
>>
>>         ret = rte_mempool_set_ops_byname(mp, *op, NULL);
>>         if (ret != 0) {
>>             RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>>             rte_mempool_free(mp);
>>             if (ret == -ENOTSUP)
>>                 continue;
>>             rte_errno = -ret;
>>             return NULL;
>>         }
>>         rte_pktmbuf_pool_init(mp, &mbp_priv);
>>
>>         ret = rte_mempool_populate_default(mp);
>>         if (ret < 0) {
>>             rte_mempool_free(mp);
>>             if (ret == -ENOTSUP)
>>                 continue;
>>             rte_errno = -ret;
>>             return NULL;
>>         }
>>     }
>>
>>     rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
>>
>>     return mp;
>> }
>>
>>
> [Hemant]  This look fine to me. Please submit a patch for the same. 
> 
>>>>> I've just submitted an RFC, which I think is quite linked:
>>>>> http://dpdk.org/ml/archives/dev/2016-September/046974.html
>>>>> Assuming a new parameter "mempool_ops" is added to
>>>>> rte_pktmbuf_pool_create(), would it make sense to fallback to
>>>>> "ring_mp_mc"? What about just returning ENOTSUP? The application
>>>>> could do the job and decide which sw fallback to use.
>>>>
>>>> We ran into this issue when trying to run the standard DPDK examples
>>>> (l3fwd) in VM. Do you think, is it practical to add fallback handling
>>>> in each of the DPDK examples?
>>
>> OK. What is still unclear for me, is how the software is aware of the different
>> hardware-assisted handlers. Moreover, we could imagine more software
>> handlers, which could be used depending on the use case.
>>
>> I think this choice has to be made by the user or the application:
>>
>> - the application may want to use a specific (sw or hw) handler: in
>>   this case, it want to be notified if it fails, instead of having
>>   a quiet fallback to ring_mp_mc
>> - if several handlers are available, the application may want to
>>   try them in a specific order
>> - maybe some handlers will have some limitations with some
>>   configurations or driver? The application could decide to use
>>   a different handler according the configuration.
>>
>> So that's why I think this is an application decision.
>>
> [Hemant]  We should simplify it:  if the application has supplied the handler, it is application's responsibility to take care of failure. Only if the application want to use the default handler, the implementation can fallback.  The fallback handler can again be configurable. 

Honestly, I'm not very convinced that having a quiet fallback is
the proper solution: the application won't be notified that the
fallback occured.

I understand your patch solves your use-case, but my fear is that
we just integrate this minimal patch and never do the harder work
of solving all the use-cases. I think we need to give the tools to
the applications to control what occurs because we also need to solve
these issues:
- you have a hardware handler, but you want to use the software
  handler
- you have several software handlers, and you (as an administrator) or
  the application knows which one is the best to use in your case.

An alternative (which I don't like either) to solve your issue without
modifying the mbuf code is to have your own handler fallbacking to
ring_mp_mc.


>>>> Typically when someone is writing a application on host, he need not
>>>> worry non-availability of the hw offloaded mempool. He may also want
>>>> to run the same binary in virtual machine. In VM, it is not
>>>> guaranteed that hw offloaded mempools will be available.
>>
>> Running the same binary is of course a need. But if your VM does not provide
>> the same virtualized hardware than the host, I think the command line option
>> makes sense.
>>
>> AFAIU, on the host, you can use a hw mempool handler or a sw one, and on the
>> guest, only a sw one. So you need an option to select the behavior you want on
>> the host, without recompiling.
>>
>> I understand that modifying all the applications is not a good option either. I'm
>> thinking a a EAL parameter that would allow to configure a library (mbuf in this
>> case). Something like kernel boot options.
>> Example: testpmd -l 2,4 --opt mbuf.handler="ring_mp_mc" -- [app args]
>>
>> I don't know if this is feasible, this has to be discussed first, but what do you
>> think on the principle? The value could also be an ordered list if we want a
>> fallback, and this option could be overriden by the application before creating
>> the mbuf pool. There was some discussions about a kind of dpdk global
>> configuration some time ago.
>>
> [Hemant] I agree that command line option provide a better control in this case. 
> On the flipside, We need to be careful that we do not end up having too many command line options. 

Yes, we should avoid having too many command line arguments.
I think we should go in the direction of having a sort of key/value
database in EAL. It could be set by:
- a generic command line argument
- a config file (maybe later)
- the application through an API

Then, it would be used by:
- dpdk libraries
- the application (maybe)

This has been discussed some times, the latest was probably:
http://dpdk.org/ml/archives/dev/2016-June/040079.html

I think it would be a good tool for dpdk configurability, and it would
help to remove some compile-time options and maybe some eal options.
Unfortunately, I don't have the time to work on this at the moment.

Regards,
Olivier



More information about the dev mailing list