[dpdk-users] Query on handling packets
thadodaharsh10 at gmail.com
Thu Jan 3 19:12:16 CET 2019
We applied your suggestion of removing the `IsLinkUp()` call. But the
performace is even worse. We could only get around 340kbits/s.
The Top Hotspots are:
Function Module CPU Time
eth_em_recv_pkts librte_pmd_e1000.so 15.106s
rte_delay_us_block librte_eal.so.6.1 7.372s
ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so 5.080s
rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 3.558s
Upon checking the callers of `rte_delay_us_block`, we got to know that most
of the time (92%) spent in this function is during initialization.
This does not waste our processing time during communication. So, it's a
good start to our optimization.
Callers CPU Time: Total CPU Time: Self
rte_delay_us_block 100.0% 7.372s
e1000_enable_ulp_lpt_lp 92.3% 6.804s
e1000_write_phy_reg_mdic 1.8% 0.136s
e1000_reset_hw_ich8lan 1.7% 0.128s
e1000_read_phy_reg_mdic 1.4% 0.104s
eth_em_link_update 1.4% 0.100s
e1000_get_cfg_done_generic 0.7% 0.052s
e1000_post_phy_reset_ich8lan.part.18 0.7% 0.048s
Effective CPU Utilization: 21.4% (0.856 out of 4)
Here is the link to vtune profiling results.
On Sun, Dec 30, 2018, 06:00 Wiles, Keith <keith.wiles at intel.com> wrote:
> > On Dec 29, 2018, at 4:03 PM, Harsh Patel <thadodaharsh10 at gmail.com>
> > Hello,
> > As suggested, we tried profiling the application using Intel VTune
> Amplifier. We aren't sure how to use these results, so we are attaching
> them to this email.
> > The things we understood were 'Top Hotspots' and 'Effective CPU
> utilization'. Following are some of our understandings:
> > Top Hotspots
> > Function Module CPU Time
> > rte_delay_us_block librte_eal.so.6.1 15.042s
> > eth_em_recv_pkts librte_pmd_e1000.so 9.544s
> > ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so
> > ns3::DpdkNetDeviceReader::DoRead
> libns3.28.1-fd-net-device-debug.so 2.470s
> > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 2.456s
> > [Others] 6.656s
> > We knew about other methods except `rte_delay_us_block`. So we
> investigated the callers of this method:
> > Callers Effective Time Spin Time Overhead Time Effective Time
> Spin Time Overhead Time Wait Time: Total Wait Time: Self
> > e1000_enable_ulp_lpt_lp 45.6% 0.0% 0.0% 6.860s 0usec 0usec
> > e1000_write_phy_reg_mdic 32.7% 0.0% 0.0% 4.916s 0usec
> > e1000_read_phy_reg_mdic 19.4% 0.0% 0.0% 2.922s 0usec 0usec
> > e1000_reset_hw_ich8lan 1.0% 0.0% 0.0% 0.143s 0usec 0usec
> > eth_em_link_update 0.7% 0.0% 0.0% 0.100s 0usec 0usec
> > e1000_post_phy_reset_ich8lan.part.18 0.4% 0.0% 0.0% 0.064s
> 0usec 0usec
> > e1000_get_cfg_done_generic 0.2% 0.0% 0.0% 0.037s 0usec
> > We lack sufficient knowledge to investigate more than this.
> > Effective CPU utilization
> > Interestingly, the effective CPU utilization was 20.8% (0.832 out of 4
> logical CPUs). We thought this is less. So we compared this with the
> raw-socket version of the code, which was even less, 8.0% (0.318 out of 4
> logical CPUs), and even then it is performing way better.
> > It would be helpful if you give us insights on how to use these results
> or point us to some resources to do so.
> > Thank you
> BTW, I was able to build ns3 with DPDK 18.11 it required a couple changes
> in the DPDK init code in ns3 plus one hack in rte_mbuf.h file.
> I did have a problem including rte_mbuf.h file into your code. It appears
> the g++ compiler did not like referencing the struct rte_mbuf_sched inside
> the rte_mbuf structure. The rte_mbuf_sched was inside the big union as a
> hack I moved the struct outside of the rte_mbuf structure and replaced the
> struct in the union with ’struct rte_mbuf_sched sched;', but I am guessing
> you are missing some compiler options in your build system as DPDK builds
> just fine without that hack.
> The next place was the rxmode and the txq_flags. The rxmode structure has
> changed and I commented out the inits in ns3 and then commented out the
> txq_flags init code as these are now the defaults.
More information about the users