[dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
jinho hwang
hwang.jinho at gmail.com
Tue Nov 19 18:04:29 CET 2013
On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
<keith.wiles at windriver.com> wrote:
>
> BTW, the configuration looks fine, but you need to make sure the lcores are not split between two different CPU sockets. You can use the dpdk/tools/cpu_layout.py to do dump out the system configuration.
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
> mobile 940.213.5533
>
>
> On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho at gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
> <keith.wiles at windriver.com> wrote:
>
> How do you have Pktgen configured in this case?
>
> On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
> 82599x two ports. My machine has a PCIe bug that does not allow me to send
> on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
> the forth port does is about 70% of wire rate because of the PCIe hardware
> bottle neck problem.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> direct 972.434.4136 mobile 940.213.5533 fax 000.000.0000
>
> On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho at gmail.com> wrote:
>
> Hi All,
>
> I have two NICs (82599) x two ports that are used as packet generators. I
> want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
> seem to be able to do it when two port in a NIC are used simultaneously.
> Does anyone know how to generate 40Gbps without replicating packets in the
> switch?
>
> Thank you,
>
> Jinho
>
>
>
> Hi Keith,
>
> Thank you for the e-mail. I am not sure how I figure out whether my
> PCIe also has any problems to prevent me from sending full line-rates.
> I use Intel(R) Xeon(R) CPU E5649 @ 2.53GHz. It is hard for
> me to figure out where is the bottleneck.
>
> My configuration is:
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua
>
>
> === port to lcore mapping table (# lcores 9) ===
>
> lcore: 0 1 2 3 4 5 6 7 8
>
> port 0: D: T 1: 0 0: 1 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 = 1: 1
>
> port 1: D: T 0: 0 0: 0 1: 0 0: 1 0: 0 0: 0 0: 0 0: 0 = 1: 1
>
> port 2: D: T 0: 0 0: 0 0: 0 0: 0 1: 0 0: 1 0: 0 0: 0 = 1: 1
>
> port 3: D: T 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 1: 0 0: 1 = 1: 1
>
> Total : 0: 0 1: 0 0: 1 1: 0 0: 1 1: 0 0: 1 1: 0 0: 1
>
> Display and Timer on lcore 0, rx:tx counts per port/lcore
>
>
> Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128
>
> Lcore:
>
> 1, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): (
> 0: 0) , TX (pid:qid):
>
> 2, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 0: 0)
>
> 3, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): (
> 1: 0) , TX (pid:qid):
>
> 4, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 1: 0)
>
> 5, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): (
> 2: 0) , TX (pid:qid):
>
> 6, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 2: 0)
>
> 7, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): (
> 3: 0) , TX (pid:qid):
>
> 8, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 3: 0)
>
>
> Port :
>
> 0, nb_lcores 2, private 0x6fd5a0, lcores: 1 2
>
> 1, nb_lcores 2, private 0x700208, lcores: 3 4
>
> 2, nb_lcores 2, private 0x702e70, lcores: 5 6
>
> 3, nb_lcores 2, private 0x705ad8, lcores: 7 8
>
>
>
> Initialize Port 0 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:2f:f2:a4
>
> Create: Default RX 0:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
>
> Create: Default TX 0:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Range TX 0:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Sequence TX 0:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Special TX 0:0 - Memory used (MBUFs 64 x (size 1984 +
> Hdr 64)) + 395392 = 515 KB
>
>
>
> Port memory used = 10251 KB
>
> Initialize Port 1 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:2f:f2:a5
>
> Create: Default RX 1:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
>
> Create: Default TX 1:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Range TX 1:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Sequence TX 1:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Special TX 1:0 - Memory used (MBUFs 64 x (size 1984 +
> Hdr 64)) + 395392 = 515 KB
>
>
>
> Port memory used = 10251 KB
>
> Initialize Port 2 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:4a:e6:1c
>
> Create: Default RX 2:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
>
> Create: Default TX 2:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Range TX 2:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Sequence TX 2:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Special TX 2:0 - Memory used (MBUFs 64 x (size 1984 +
> Hdr 64)) + 395392 = 515 KB
>
>
>
> Port memory used = 10251 KB
>
> Initialize Port 3 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:4a:e6:1d
>
> Create: Default RX 3:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
>
> Create: Default TX 3:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Range TX 3:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Sequence TX 3:0 - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 = 2435 KB
>
> Create: Special TX 3:0 - Memory used (MBUFs 64 x (size 1984 +
> Hdr 64)) + 395392 = 515 KB
>
>
>
> Port memory used = 10251 KB
>
>
> Total memory used = 41003 KB
>
> Port 0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port 1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port 2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port 3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
>
> === Display processing on lcore 0
>
> === RX processing on lcore 1, rxcnt 1, port/qid, 0/0
>
> === TX processing on lcore 2, txcnt 1, port/qid, 0/0
>
> === RX processing on lcore 3, rxcnt 1, port/qid, 1/0
>
> === TX processing on lcore 4, txcnt 1, port/qid, 1/0
>
> === RX processing on lcore 5, rxcnt 1, port/qid, 2/0
>
> === TX processing on lcore 6, txcnt 1, port/qid, 2/0
>
> === RX processing on lcore 7, rxcnt 1, port/qid, 3/0
>
> === TX processing on lcore 8, txcnt 1, port/qid, 3/0
>
>
> Please, advise me if you have time.
>
> Thank you always for your help!
>
> Jinho
>
>
The phenomenon is that when I start one port in one NIC, it reaches
10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
each = 20Gbps. But, when I start two port in one NIC, it becomes
5.8Gbps each. This is persistent when cores are assigned
differently---cross sockets and the same sockets. Since the size of
huge pages are fixed, it will not be a problem. Should we say this is
the limitation on NIC or bus? The reason I think this may be a hw
limitation is that regardless of packet sizes, two ports in one NIC
can only send 5.8Gbps maximum.
Do you have any way that I can calculate the hw limitation?
Jinho
More information about the dev
mailing list