[dpdk-dev] ways to generate 40Gbps with two NICs x two ports?

jinho hwang hwang.jinho at gmail.com
Wed Nov 20 21:58:07 CET 2013
Previous message: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
Next message: [dpdk-dev] 82546EB Copper issue
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Nov 19, 2013 at 4:38 PM, Wiles, Roger Keith
<keith.wiles at windriver.com> wrote:
>
> I do not think a newer version will effect the performance, but you can try it.
>
> git clone git://github.com/Pktgen/Pktgen-DPDK
>
> This one is 2.2.5 and DPDK 1.5.0
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
> mobile 940.213.5533
>
> On Nov 19, 2013, at 3:33 PM, jinho hwang <hwang.jinho at gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 4:18 PM, Wiles, Roger Keith
> <keith.wiles at windriver.com> wrote:
>
> Give this a try, if that does not work then something else is going on here.
> I am trying to make sure we do not cross the QPI for any reason putting the
> RX/TX queues related to a port on the same core.
>
> sudo ./app/build/pktgen -c 3ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[2:4].0, [6:8].1, [3:5].2, [7:9].3" -f test/forward.lua
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua
>
>
> cores =  [0, 1, 2, 8, 9, 10]
> sockets =  [1, 0]
>       Socket 1        Socket 0
>       ---------       ---------
> Core 0  [0, 12]         [1, 13]
> Core 1  [2, 14]         [3, 15]
> Core 2  [4, 16]         [5, 17]
> Core 8  [6, 18]         [7, 19]
> Core 9  [8, 20]         [9, 21]
> Core 10         [10, 22]        [11, 23]
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
> On Nov 19, 2013, at 11:35 AM, jinho hwang <hwang.jinho at gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 12:24 PM, Wiles, Roger Keith
> <keith.wiles at windriver.com> wrote:
>
> Normally when I see this problem it means the the lcores are not mapped
> correctly. What can happen is you have a Rx and a TX on the same physical
> core or two RX/TX on the same physical core.
>
> Make sure you have a Rx or Tx running on a single core look at the
> cpu_layout.py output and verify the configuration is correct. If you have 8
> physical cores in the then you need to make sure on one of the lcores on
> that core is being used.
>
> Let me know what happens.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
> On Nov 19, 2013, at 11:04 AM, jinho hwang <hwang.jinho at gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
> <keith.wiles at windriver.com> wrote:
>
>
> BTW, the configuration looks fine, but you need to make sure the lcores are
> not split between two different CPU sockets. You can use the
> dpdk/tools/cpu_layout.py to do dump out the system configuration.
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
>
> On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho at gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
> <keith.wiles at windriver.com> wrote:
>
> How do you have Pktgen configured in this case?
>
> On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
> 82599x two ports. My machine has a PCIe bug that does not allow me to send
> on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
> the forth port does is about 70% of wire rate because of the PCIe hardware
> bottle neck problem.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000
>
> On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho at gmail.com> wrote:
>
> Hi All,
>
> I have two NICs (82599) x two ports that are used as packet generators. I
> want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
> seem to be able to do it when two port in a NIC are used simultaneously.
> Does anyone know how to generate 40Gbps without replicating packets in the
> switch?
>
> Thank you,
>
> Jinho
>
>
>
> Hi Keith,
>
> Thank you for the e-mail. I am not sure how I figure out whether my
> PCIe also has any problems to prevent me from sending full line-rates.
> I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
> me to figure out where is the bottleneck.
>
> My configuration is:
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua
>
>
> === port to lcore mapping table (# lcores 9) ===
>
> lcore:     0     1     2     3     4     5     6     7     8
>
> port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1
>
> port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1
>
> Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1
>
> Display and Timer on lcore 0, rx:tx counts per port/lcore
>
>
> Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128
>
> Lcore:
>
> 1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 0: 0) , TX (pid:qid):
>
> 2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 0: 0)
>
> 3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 1: 0) , TX (pid:qid):
>
> 4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 1: 0)
>
> 5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 2: 0) , TX (pid:qid):
>
> 6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 2: 0)
>
> 7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 3: 0) , TX (pid:qid):
>
> 8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 3: 0)
>
>
> Port :
>
> 0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2
>
> 1, nb_lcores  2, private 0x700208, lcores:  3  4
>
> 2, nb_lcores  2, private 0x702e70, lcores:  5  6
>
> 3, nb_lcores  2, private 0x705ad8, lcores:  7  8
>
>
>
> Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4
>
> Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
> Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5
>
> Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
> Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c
>
> Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
> Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d
>
> Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
> Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
>
> Total memory used =  41003 KB
>
> Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
>
> === Display processing on lcore 0
>
> === RX processing on lcore  1, rxcnt 1, port/qid, 0/0
>
> === TX processing on lcore  2, txcnt 1, port/qid, 0/0
>
> === RX processing on lcore  3, rxcnt 1, port/qid, 1/0
>
> === TX processing on lcore  4, txcnt 1, port/qid, 1/0
>
> === RX processing on lcore  5, rxcnt 1, port/qid, 2/0
>
> === TX processing on lcore  6, txcnt 1, port/qid, 2/0
>
> === RX processing on lcore  7, rxcnt 1, port/qid, 3/0
>
> === TX processing on lcore  8, txcnt 1, port/qid, 3/0
>
>
> Please, advise me if you have time.
>
> Thank you always for your help!
>
> Jinho
>
>
>
> The phenomenon is that when I start one port in one NIC, it reaches
> 10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
> each = 20Gbps. But, when I start two port in one NIC, it becomes
> 5.8Gbps each. This is persistent when cores are assigned
> differently---cross sockets and the same sockets. Since the size of
> huge pages are fixed, it will not be a problem. Should we say this is
> the limitation on NIC or bus? The reason I think this may be a hw
> limitation is that regardless of packet sizes, two ports in one NIC
> can only send 5.8Gbps maximum.
>
> Do you have any way that I can calculate the hw limitation?
>
> Jinho
>
>
>
> My cpu configuration is as follows:
>
> ============================================================
>
> Core and Socket Information (as reported by '/proc/cpuinfo')
>
> ============================================================
> cores =  [0, 1, 2, 8, 9, 10]
> sockets =  [1, 0]
>       Socket 1        Socket 0
>       ---------       ---------
> Core 0  [0, 12]         [1, 13]
> Core 1  [2, 14]         [3, 15]
> Core 2  [4, 16]         [5, 17]
> Core 8  [6, 18]         [7, 19]
> Core 9  [8, 20]         [9, 21]
> Core 10         [10, 22]        [11, 23]
>
> When I use just two ports for testing, I use this configuration.
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK_LIST -- -p 0x30 -P -m
> "[2:4].0, [6:8].1" -f test/forward.lua
>
> As you can see the core numbers, 2, 4, 6, 8 are all in different
> physical cores and are assigned separately. I am not sure that happens
> with core configuration. Do have any other thoughts we may try?
>
> Thanks,
>
> Jinho
>
>
> Actually, I tried it before, and to make sure I tried it again just
> now. Still, it only shows me 5.8Gbps for each port. What other
> possibilities do you think I have to try? I am losing hopes how. Does
> the version matter? I am using Pktgen Ver:2.1.3(DPDK-1.3.0) in my
> system.
>

Keith,

Yes, the newer version did not work either. Since I am able to send
close to 24Gbps from two NICs, I do not think the limitation comes
from bus or memory. It may be because of how I use the NICs. I am
stick to this hypothesis now, and trying to use more core/queue for
tx. The problem is that when I tried this

sudo ./app/build/pktgen -c 3ff -n 3 $BLACK_LIST -- -p 0x30 -P -m
"[2-5].0, [6-9].1" -f test/forward.lua

, the log shows 4 tx/rx queues are assigned, but it seems that only
1/4 (2.5Gbps) are transmitted. My questions are:

1. It seems to me 4 queues/cores have 1/4 or work, but only one
queue/core is working. Do you know the reason for this? and how to fix
it?
2. Can I make a configuration with one core x multiple queues?
3. Is there any way to see more statistics in commands?

Thank you,

Jinho
Previous message: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
Next message: [dpdk-dev] 82546EB Copper issue
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list