[dpdk-dev] 回复: DPDK & QPI performance issue in Romley platform.

Bob Chen beef9999 at qq.com
Tue Sep 3 18:19:28 CEST 2013


QPI bandwidth is definitely large enough, but it seems that QPI is only responsible for the communication between separate CPU chips. What you need to do is actually accessing the memory on the other part, probably not even hit the bandwidth. The latency can be caused by a lot of facts during a NUMA operation.

/Bob


------------------ 原始邮件 ------------------
发件人: "Zachary";<zachary.jen at cas-well.com>;
发送时间: 2013年9月2日(星期一) 中午11:22
收件人: "dev"<dev at dpdk.org>; 
抄送: " "Yannic.Chou (周哲正) : 6808" <yannic.chou at cas-well.com>; "Alan Yu 俞亦偉 : 6632""<Alan.Yu at cas-well.com>; 
主题: [dpdk-dev] DPDK & QPI performance issue in Romley platform.



 Hi~
 
 I have a question about DPDK & QPI performance issue in Romley  platform.
 Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romley platform.
 When I try to do the test, crossing used CPU, I find the performance dramatically decrease.
 Is it true? Or any method can prove the phenomenon?
 
 In my opinion, there should be no this kind of issue here due to QPI have enough bandwidth to deal the kinds of case.
 Thus, I am so amaze in our results and can not explain it.
 Could someone can help me to solve this problem.
 
 Thank a lot!
 
 
 My testing environment describe as below:
 
 Platform:         Romley
 CPU:                E5-2643 * 2
 RAM:               Transcend  8GB PC3-1600 DDR3 * 8
 OS:                 Fedora core 14
 DPDK:            v1.3.1r2, example/l2fwd
 Slot setting:
                       SlotA is controled by CPU1 directly.
                       SlotB is controled by CPU0 directly.
 
 DPDK pre-setting:
 a. BIOS setting:
     HT=disable
 b. Kernel paramaters 
     isolcpus=2,3,6,7
     default_hugepagesz=1024M
     hugepagesz=1024M
     hugepages=16
 c. OS setting:
     service avahi-daemon stop
     service NetworkManager stop
     service iptables stop
     service acpid stop
     selinux disable
 
 
 Example program Command:
 a. SlotB(CPU0) -> CPU1
     #>./l2fwd -c 0xc -n 4 -- -q 1 -p 0xc
 
 b. SlotA(CPU1) -> CPU0
     #>./l2fwd -c 0xc0 -n 4 -- -q 1 -p 0xc0 
 
 Results:
      use frame size 128 bytes
     
CPU Affinity
   
Slot A (CPU1)
   
Slot B (CPU0)
  
   
CPU0
   
15.9%
   
96.49%
  
   
CPU1
   
90.88%
   
24.78%
  
   
 
 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://dpdk.org/ml/archives/dev/attachments/20130904/04047858/attachment.html>


More information about the dev mailing list