[dpdk-dev] [PATCH v2 00/17] add TSO support

Venkatesan, Venky venky.venkatesan at intel.com
Fri May 23 16:43:10 CEST 2014


Olivier, 


>> It's because we haven't gotten to testing the patch yet, and figuring  > out all the problems. Putting it in and modifying MBUF needs a bit of  > time - one other option that I've looked at is to let the transmit  > offload parts (except for the VLAN) flow onto the second cache  > line. That doesn't seem to have a performance hit at this point -  > since it's going to be populated before calling transmit anyway, it's  > cache hot. Have we thought of simply doing that instead of these  > changes that have net negative side effects in terms of mbuf mods?

> I think that the performance gain on a real use case provided by this patch series can justify a really small impact (see my test reports) on demonstration-only applications: my testpmd iofwd test with the txqflags option disabling many mbuf features is not representative of a real world application.

[Venky] I did see your test reports. I also made the point that the tests we have are insufficient for testing the impact. If you look at data_ofs, it actually has an impact on two sides - the driver and the upper layer. We do not at this time have a test for the upper layer/accessor. Secondly, there is a whole class of apps (fast path route for example) that aren't in your purview that do not need txqflags. Calling it not representative of a real world application is incorrect. 

Secondly, your testpmd run baseline performance should be higher. At this point, it is about 40% off from the numbers we see on the baseline on the same CPU. If the baseline is incorrect, I cannot judge anything more on the performance. We need to get the baseline performance the same, and then compare impact. 

> In my opinion, moving offload parts outside in another cache line would have an impact on performance. If not, why would you exclude vlan?
But this is speculation. As Neil and Thomas suggested previously, we should rely on performance and functional tests.

[Venky] I exclude VLAN because it is something explicitly set by the Rx side of the driver. Having Rx access a second cache line will generate a performance impact (can be mitigated by a prefetch, but it will cost more instructions, and cannot be deterministically controlled). The rest of the structure is on the transmit side - which is going to be cache hot - at least in LLC anyway. There are cases where this will not be in LLC - and we have a few of those. Those however, we can mitigate.

> Today, there is no alternative that brings equivalent features and better performance (I mean there is no patch nor test reports). If the series is applied after your ack, it won't prevent anyone to bring new enhancements or reworks on top it. 

[Venky] I don't think reworking core data structures (especially regressing core data structures) is a good thing. We have kept this relatively stable over 5 releases, sometimes at the impact of performance, and churning data structures is not a good thing. 

BR,
- Venky


More information about the dev mailing list