[dpdk-dev] generic load balancing

Michael Quicquaro michael.quicquaro at gmail.com
Thu Dec 5 16:42:49 CET 2013


This is a good discussion and I hope Intel can see and benefit from it.
 For my "usecase", I don't necessarily need round robin on a per packet
level, but simply some normalized distribution among core queues that has
nothing to do with anything inside the packet.  A good solution perhaps
could be to allow the NIC to switch to another core's queue after a certain
number of packets have been received... perhaps using something like the
burst rate.  I just see this as being something that is of the most
fundamental in functionality and lacking in the DPDK.  I'm sure there are
many "usecase"s that don't involve routing/forwarding/switching/etc. but of
course need to maximize throughput.

- Mike


On Thu, Dec 5, 2013 at 9:29 AM, Prashant Upadhyaya <
prashant.upadhyaya at aricent.com> wrote:

> Hi,
>
> Well, GTP is the main usecase.
> We end up with a GTP tunnel between the two machines.
> And ordinarily with 82599, all the data will land up on a single queue and
> therefore must be polled on a single core. Bottleneck.
>
> But in general, if I want to employ all the CPU cores horsepower
> simultaneously to pickup the packets from NIC, then it is natural that I
> drop a queue each for every core into the NIC and if the NIC does a round
> robin then it naturally fans out and I can use all the cores to lift
> packets from NIC in a load balanced fashion.
>
> Imagine a theoretical usecase, where I have to lift the packets from the
> NIC, inspect it myself in the application and then switch them to the right
> core for further processing. So my cores have two jobs, one is to poll the
> NIC and then switch the packets to the right core. Here I would simply love
> to poll the queue and the intercore ring from each core to achieve the
> processing. No single core will become the bottleneck as far as polling the
> NIC is concerned. You might argue on what basis I switch to the relevant
> core for further processing, but that's _my_ usecase and headache to
> further equally distribute amongst the cores.
>
> Imagine an LTE usecase where I am on the core side (SGW), the packets come
> over GTP from thousands of mobiles (via eNB). I can employ all the cores to
> pickup the GTP packets (if NIC gives me round robin) and then based on the
> inner IP packet's src IP address (the mobile IP address), I can take it to
> the further relevant core for processing. This way I will get a complete
> load balancing done not only for polling from NIC but also for processing
> of the inner IP packets.
>
> I have also worked a lot on Cavium processors. Those of you who are
> familiar with that would know that the POW scheduler gives the packets to
> whichever core is requesting for work so the packets can go to any core in
> Cavium Octeon processor. The only way to achieve similar functionality in
> DPDK is to drop a queue per core into the NIC and then let NIC do round
> robin on those queues blindly. What's the harm if this feature is added,
> let those who want to use it, use, and those who hate it or think it is
> useless, ignore.
>
> Regards
> -Prashant
>
> -----Original Message-----
> From: François-Frédéric Ozog [mailto:ff at ozog.com]
> Sent: Thursday, December 05, 2013 2:16 PM
> To: Prashant Upadhyaya
> Cc: 'Michael Quicquaro'; 'Stephen Hemminger'; dev at dpdk.org
> Subject: RE: [dpdk-dev] generic load balancing
>
> Hi,
>
> If the traffic you manage is above MPLS or GTP encapsulations, then you
> can use cards that provide flexible hash functions. Chelsio cxgb5 provides
> combination of "offset", length and tuple that may help.
>
> The only reason I would have loved to get a pure round robin feature was
> to pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint)
>  tests where the traffic issue was multicast from a single source... But
> that is not real life traffic.
>
> If you could share the use case...
>
> François-Frédéric
>
> > -----Message d'origine-----
> > De : Prashant Upadhyaya [mailto:prashant.upadhyaya at aricent.com]
> > Envoyé : jeudi 5 décembre 2013 06:30
> > À : Stephen Hemminger
> > Cc : François-Frédéric Ozog; Michael Quicquaro; dev at dpdk.org Objet :
> > RE: [dpdk-dev] generic load balancing
> >
> > Hi Stepher,
> >
> > The awfulness depends upon the 'usecase'
> > I have eg. a usecase where I want this roundrobin behaviour.
> >
> > I just want the NIC to give me a facility to use this.
> >
> > Regards
> > -Prashant
> >
> >
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Thursday, December 05, 2013 10:25 AM
> > To: Prashant Upadhyaya
> > Cc: François-Frédéric Ozog; Michael Quicquaro; dev at dpdk.org
> > Subject: Re: [dpdk-dev] generic load balancing
> >
> > Round robin would actually be awful for any protocol because it would
> cause
> > out of order packets.
> > That is why flow based algorithms like flow director and RSS work much
> > better.
> >
> > On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya
> > <prashant.upadhyaya at aricent.com> wrote:
> > > Hi,
> > >
> > > It's a real pity that Intel 82599 NIC (and possibly others) don't
> > > have a
> > simple round robin scheduling of packets on the configured queues.
> > >
> > > I have requested Intel earlier, and using this forum requesting
> > > again --
> > please please put this facility in the NIC that if I drop N queues
> > there and configure  the NIC for some round robin scheduling on
> > queues, then NIC should simply put the received packets one by one on
> > queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> > > The above is very useful in lot of load balancing cases.
> > >
> > > Regards
> > > -Prashant
> > >
> > >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> > > François-Frédéric Ozog
> > > Sent: Thursday, December 05, 2013 2:35 AM
> > > To: 'Michael Quicquaro'
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] generic load balancing
> > >
> > > Hi,
> > >
> > > As far as I can tell, this is really hardware dependent. Some hash
> > functions allow uplink and downlink packets of the same "session" to
> > go to the same queue (I know Chelsio can do this).
> > >
> > > For the Intel card, you may find what you want in:
> > > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10
> > > -g
> > > be-con
> > > troller-datasheet.html
> > >
> > > Other cards require NDA or other agreements to get details of RSS.
> > >
> > > If you have a performance problem, may I suggest you use kernel 3.10
> then
> > monitor system activity with "perf" command. For instance you can
> > start with "perf top -a" this will give you nice information. Then
> > your creativity will do the rest ;-) You may be surprised what comes
> > on the top hot points...
> > > (the most unexpected hot function I found here was Linux syscall
> > > gettimeofday!!!)
> > >
> > > François-Frédéric
> > >
> > >> -----Message d'origine-----
> > >> De : dev [mailto:dev-bounces at dpdk.org] De la part de Michael
> > >> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev at dpdk.org
> Objet
> > :
> > >> [dpdk-dev] generic load balancing
> > >>
> > >> Hi all,
> > >> I am writing a dpdk application that will receive packets from one
> > >> interface and process them.  It does not forward packets in the
> > > traditional
> > >> sense.  However, I do need to process them at full line rate and
> > >> therefore need more than one core.  The packets can be somewhat
> > >> generic in nature
> > > and
> > >> can be nearly identical (especially at the beginning of the packet).
> > >> I've used the rxonly function of testpmd as a model.
> > >>
> > >> I've run into problems in processing a full line rate of data since
> > >> the nature of the data causes all the data to be presented to only
> > >> one
> > core.
> > > I
> > >> get a large percentage of dropped packets (shows up as Rx-Errors in
> > >> "port
> > >> stats") because of this.  I've tried modifying the data so that
> > >> packets have different UDP ports and that seems to work when I use
> > >> --rss-udp
> > >>
> > >> My questions are:
> > >> 1) Is there a way to configure RSS so that it alternates packets to
> > >> all configured cores regardless of the packet data?
> > >>
> > >> 2)  Where is the best place to learn more about RSS and how to
> > >> configure it? I have not found much in the DPDK documentation.
> > >>
> > >> Thanks for the help,
> > >> - Mike
> > >
> > >
> > >
> > >
> > >
> > > ====================================================================
> > > ==
> > > ========= Please refer to
> > > http://www.aricent.com/legal/email_disclaimer.html
> > > for important disclosures regarding this electronic communication.
> > > ====================================================================
> > > ==
> > > =========
> >
> >
> >
> >
> >
> ===========================================================================
> > ====
> > Please refer to http://www.aricent.com/legal/email_disclaimer.html
> > for important disclosures regarding this electronic communication.
> >
> ===========================================================================
> > ====
>
>
>
>
>
>
> ===============================================================================
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
> ===============================================================================
>


More information about the dev mailing list