[dpdk-dev] generic load balancing

Prashant Upadhyaya prashant.upadhyaya at aricent.com
Thu Dec 5 15:29:34 CET 2013


Hi,

Well, GTP is the main usecase.
We end up with a GTP tunnel between the two machines.
And ordinarily with 82599, all the data will land up on a single queue and therefore must be polled on a single core. Bottleneck.

But in general, if I want to employ all the CPU cores horsepower simultaneously to pickup the packets from NIC, then it is natural that I drop a queue each for every core into the NIC and if the NIC does a round robin then it naturally fans out and I can use all the cores to lift packets from NIC in a load balanced fashion.

Imagine a theoretical usecase, where I have to lift the packets from the NIC, inspect it myself in the application and then switch them to the right core for further processing. So my cores have two jobs, one is to poll the NIC and then switch the packets to the right core. Here I would simply love to poll the queue and the intercore ring from each core to achieve the processing. No single core will become the bottleneck as far as polling the NIC is concerned. You might argue on what basis I switch to the relevant core for further processing, but that's _my_ usecase and headache to further equally distribute amongst the cores.

Imagine an LTE usecase where I am on the core side (SGW), the packets come over GTP from thousands of mobiles (via eNB). I can employ all the cores to pickup the GTP packets (if NIC gives me round robin) and then based on the inner IP packet's src IP address (the mobile IP address), I can take it to the further relevant core for processing. This way I will get a complete load balancing done not only for polling from NIC but also for processing of the inner IP packets.

I have also worked a lot on Cavium processors. Those of you who are familiar with that would know that the POW scheduler gives the packets to whichever core is requesting for work so the packets can go to any core in Cavium Octeon processor. The only way to achieve similar functionality in DPDK is to drop a queue per core into the NIC and then let NIC do round robin on those queues blindly. What's the harm if this feature is added, let those who want to use it, use, and those who hate it or think it is useless, ignore.

Regards
-Prashant

-----Original Message-----
From: François-Frédéric Ozog [mailto:ff at ozog.com]
Sent: Thursday, December 05, 2013 2:16 PM
To: Prashant Upadhyaya
Cc: 'Michael Quicquaro'; 'Stephen Hemminger'; dev at dpdk.org
Subject: RE: [dpdk-dev] generic load balancing

Hi,

If the traffic you manage is above MPLS or GTP encapsulations, then you can use cards that provide flexible hash functions. Chelsio cxgb5 provides combination of "offset", length and tuple that may help.

The only reason I would have loved to get a pure round robin feature was to pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint)  tests where the traffic issue was multicast from a single source... But that is not real life traffic.

If you could share the use case...

François-Frédéric

> -----Message d'origine-----
> De : Prashant Upadhyaya [mailto:prashant.upadhyaya at aricent.com]
> Envoyé : jeudi 5 décembre 2013 06:30
> À : Stephen Hemminger
> Cc : François-Frédéric Ozog; Michael Quicquaro; dev at dpdk.org Objet :
> RE: [dpdk-dev] generic load balancing
>
> Hi Stepher,
>
> The awfulness depends upon the 'usecase'
> I have eg. a usecase where I want this roundrobin behaviour.
>
> I just want the NIC to give me a facility to use this.
>
> Regards
> -Prashant
>
>
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Thursday, December 05, 2013 10:25 AM
> To: Prashant Upadhyaya
> Cc: François-Frédéric Ozog; Michael Quicquaro; dev at dpdk.org
> Subject: Re: [dpdk-dev] generic load balancing
>
> Round robin would actually be awful for any protocol because it would
cause
> out of order packets.
> That is why flow based algorithms like flow director and RSS work much
> better.
>
> On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya
> <prashant.upadhyaya at aricent.com> wrote:
> > Hi,
> >
> > It's a real pity that Intel 82599 NIC (and possibly others) don't
> > have a
> simple round robin scheduling of packets on the configured queues.
> >
> > I have requested Intel earlier, and using this forum requesting
> > again --
> please please put this facility in the NIC that if I drop N queues
> there and configure  the NIC for some round robin scheduling on
> queues, then NIC should simply put the received packets one by one on
> queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> > The above is very useful in lot of load balancing cases.
> >
> > Regards
> > -Prashant
> >
> >
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> > François-Frédéric Ozog
> > Sent: Thursday, December 05, 2013 2:35 AM
> > To: 'Michael Quicquaro'
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] generic load balancing
> >
> > Hi,
> >
> > As far as I can tell, this is really hardware dependent. Some hash
> functions allow uplink and downlink packets of the same "session" to
> go to the same queue (I know Chelsio can do this).
> >
> > For the Intel card, you may find what you want in:
> > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10
> > -g
> > be-con
> > troller-datasheet.html
> >
> > Other cards require NDA or other agreements to get details of RSS.
> >
> > If you have a performance problem, may I suggest you use kernel 3.10
then
> monitor system activity with "perf" command. For instance you can
> start with "perf top -a" this will give you nice information. Then
> your creativity will do the rest ;-) You may be surprised what comes
> on the top hot points...
> > (the most unexpected hot function I found here was Linux syscall
> > gettimeofday!!!)
> >
> > François-Frédéric
> >
> >> -----Message d'origine-----
> >> De : dev [mailto:dev-bounces at dpdk.org] De la part de Michael
> >> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev at dpdk.org
Objet
> :
> >> [dpdk-dev] generic load balancing
> >>
> >> Hi all,
> >> I am writing a dpdk application that will receive packets from one
> >> interface and process them.  It does not forward packets in the
> > traditional
> >> sense.  However, I do need to process them at full line rate and
> >> therefore need more than one core.  The packets can be somewhat
> >> generic in nature
> > and
> >> can be nearly identical (especially at the beginning of the packet).
> >> I've used the rxonly function of testpmd as a model.
> >>
> >> I've run into problems in processing a full line rate of data since
> >> the nature of the data causes all the data to be presented to only
> >> one
> core.
> > I
> >> get a large percentage of dropped packets (shows up as Rx-Errors in
> >> "port
> >> stats") because of this.  I've tried modifying the data so that
> >> packets have different UDP ports and that seems to work when I use
> >> --rss-udp
> >>
> >> My questions are:
> >> 1) Is there a way to configure RSS so that it alternates packets to
> >> all configured cores regardless of the packet data?
> >>
> >> 2)  Where is the best place to learn more about RSS and how to
> >> configure it? I have not found much in the DPDK documentation.
> >>
> >> Thanks for the help,
> >> - Mike
> >
> >
> >
> >
> >
> > ====================================================================
> > ==
> > ========= Please refer to
> > http://www.aricent.com/legal/email_disclaimer.html
> > for important disclosures regarding this electronic communication.
> > ====================================================================
> > ==
> > =========
>
>
>
>
>
===========================================================================
> ====
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
===========================================================================
> ====





===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================


More information about the dev mailing list