[dpdk-dev] DCA

Vlad Zolotarov vladz at cloudius-systems.com
Tue Apr 21 11:47:22 CEST 2015



On 04/21/15 12:27, Bruce Richardson wrote:
> On Tue, Apr 21, 2015 at 11:51:40AM +0300, Vlad Zolotarov wrote:
>>
>> On 04/20/15 13:50, Bruce Richardson wrote:
>>> On Mon, Apr 20, 2015 at 01:07:59PM +0300, Vlad Zolotarov wrote:
>>>> Hi,
>>>> I would like to ask if there is any reason why DPDK doesn't have support for
>>>> DCA feature?
>>>>
>>>> thanks,
>>>> vlad
>>> With modern platforms with DDIO the data written by the NIC automatically goes
>>> into the cache of the CPU without us needing to use DCA.
>> Thanks for a reply, Bruce.
>> One question though. According to DDIO documentation it only affects the
>> CPUs "local" relatively to the NIC. DCA, on the other hand may be configured
>> to work with any CPU. Modern platforms usually have a few NUMA nodes and
>> requirement of binding network handling threads only to CPUs "local" to the
>> NIC is very limiting.
>>
>> Could u, pls., comment on this?
>>
>> thanks in advance,
>> vlad
>>
> My main comment is that yes, you are correct. DDIO only works with the local
> socket, while DCA can be made to work with remote sockets. If you need to do
> polling on a device from a remote socket you may need to look at DCA.
>
> Can you perhaps comment on the use-case where you find this binding limiting? Modern
> platforms have multiple NUMA nodes, but they also generally have PCI slots
> connected to those multiple NUMA nodes also, so that you can have your NIC ports
> similarly NUMA partitionned?

The immediate example where this could be problematic is an AWS Guest 
with Enhanced Netowrking case: in c3.8xlarge instance u get a 2 NUMA 
nodes, 32 CPU cores and u can bind as many 82599 Intel VFs as u need, 
each providing 4 Rx and 4 Tx queues. AFAIR  nothing is promised about 
the locality of PFs VFs belong to. To utilize all CPUs we'll need 4 or 8 
VFs depending on the queues layout we decide (a separate CPU for each 
queue or a separate CPU for each Rx + Tx queue pair). In this case u may 
get absolutely different NUMA layouts:
     - all VFs reside on the same PF: half of the queues will be remote 
to one of the NUMA node or all of them are remote to all CPUs.
     - VFs come from two PFs which may reside in the same NUMA nodes as 
CPUs or not...
     - VFs come from more than two different PFs...

So, in the above example DCA would cover all our needs while DDIO won't 
be able to cover them in most of the cases.

>
> /Bruce



More information about the dev mailing list