[dpdk-users] Lcore impact

Александр Киселев kiselev99 at gmail.com
Thu Apr 14 23:32:52 CEST 2016


I've done my homework with perf and the results show that iTLB-load-misses
value is very high. In the tests without socket operations the processing
lcore has 0.87% of all iTLB cache hits and there is no packet loss. In the
test WITH socket operations the processing lcore has 31.09% of all iTLB
cache hits and there is about 10% packet loss. How to interpret with
results? Google shows a little about iTLB. So far some web pages suggest
the following:
"Try to minimize the size of the source code and locality so that
instructions span a minimum number of pages, and so that the instruction
span is less then the number of ITLB entries."

Any ideas?





2016-04-14 23:43 GMT+03:00 Hu, Xuekun <xuekun.hu at intel.com>:

> Perf could. Or PCM, that is also a good tool.
> https://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization
>
>
>
> *From:* Александр Киселев [mailto:kiselev99 at gmail.com]
> *Sent:* Friday, April 15, 2016 3:31 AM
> *To:* Hu, Xuekun
> *Cc:* Shawn Lewis; users at dpdk.org
>
> *Subject:* Re: [dpdk-users] Lcore impact
>
>
>
>
>
>
>
> 2016-04-14 20:49 GMT+03:00 Hu, Xuekun <xuekun.hu at intel.com>:
>
> Are the two lcore belonging to one processor, or two processors? What the
> memory footprint is for the system call threads? If the memory footprint is
> big (>LLC cache size) and two locre are in the same processor, then it
> could have impact on packet processing thread.
>
>
>
> Those two lcores belong to one processor and it's a single processor
> machine.
>
>
>
> Both cores allocates a lot of memory and use the full dpdk arsenal: lpm,
> mempools, hashes and etc. But during the test the core doing socket data
> transfering is using only small 16k buffer for sending and sending is the
> all it does during the test. It doesn't use any other allocated memory
> structures. The processing core in turn is using rte_lpm whitch is big, but
> in my test there are only about 10 routes in it, so I think the amount
> "hot" memory is not very big. But I can't say if it's bigger than l3 cpu
> cache or not. Should I use some profilers and see if socket operations
> cause a lot of cache miss in the processing lcore? It there some tool that
> allows me to do that? perf maybe?
>
>
>
>
>
>
>
> -----Original Message-----
> From: users [mailto:users-bounces at dpdk.org] On Behalf Of Alexander Kiselev
> Sent: Friday, April 15, 2016 1:19 AM
> To: Shawn Lewis
> Cc: users at dpdk.org
> Subject: Re: [dpdk-users] Lcore impact
>
> I've already seen this documen and have used this tricks a lot of times.
> But this time I send data locally over localhost. There is even no nics
> bind to linux in my machine. Therefore there is no nics interruptions I can
> pin to cpu. So what do you propose?
>
> > 14 апр. 2016 г., в 20:06, Shawn Lewis <smlsr at tencara.com> написал(а):
> >
> > You have to work with IRQBalancer as well
> >
> >
> http://www.intel.com/content/dam/doc/application-note/82575-82576-82598-82599-ethernet-controllers-interrupts-appl-note.pdf
> >
> > Is just an example document which discuss this (not so much DPDK
> related)...  But the OS will attempt to balance the interrupts when you
> actually want to remove or pin them down...
> >
> >> On Thu, Apr 14, 2016 at 1:02 PM, Alexander Kiselev <kiselev99 at gmail.com>
> wrote:
> >>
> >>
> >>> 14 апр. 2016 г., в 19:35, Shawn Lewis <smlsr at tencara.com> написал(а):
> >>>
> >>> Lots of things...
> >>>
> >>> One just because you have a process running on an lcore, does not mean
> thats all that runs on it.  Unless you have told the kernel at boot to NOT
> use those specific cores, those cores will be used for many things OS
> related.
> >>
> >> Generally yes, but unless I start sending data to socket there is no
> packet loss.  I did about 10 test runs in a raw and everythis was ok. And
> there is no other application running on that test machine that uses cpu
> cores.
> >>
> >> So the question is why this socket operations influence the other lcore?
> >>
> >>>
> >>> IRQBlance
> >>> System OS operations.
> >>> Other Applications.
> >>>
> >>> So by doing file i/o you are generating interrupts, where those
> interrupts get serviced is up to IRQBalancer.  So could be any one of your
> cores.
> >>
> >> That is a good point. I can use cpu affinity feature to bind
> unterruption handler to the core not used in my test. But I send data
> locally over localhost. Is it possible to use cpu affinity in that case?
> >>
> >>>
> >>>
> >>>
> >>>> On Thu, Apr 14, 2016 at 12:31 PM, Alexander Kiselev <
> kiselev99 at gmail.com> wrote:
> >>>> Could someone give me any hints about what could cause permormance
> issues in a situation where one lcore doing a lot of linux system calls
> (read/write on socket) slow down the other lcore doing packet forwarding?
> In my test the forwarding lcore doesn't share any memory structures with
> the other lcore that sends test data to socket. Both lcores pins to
> different processors cores. So therotically they shouldn't have any impact
> on each other but they do, once one lcore starts sending data to socket the
> other lcore starts dropping packets. Why?
> >>>
> >
>
>
>
>
>
> --
>
> С уважением,
> Киселев Александр
>



-- 
С уважением,
Киселев Александр


More information about the users mailing list