<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><a name="_MailEndCompose"><span lang=DA style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></a></p><div style='border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Varghese, Vipin [mailto:Vipin.Varghese@amd.com] <br><b>Sent:</b> Tuesday, 12 December 2023 18.14<br><br></span><o:p></o:p></p><div><div><p class=MsoNormal><span style='font-family:"Calibri","sans-serif";color:black'>Sharing a few critical points based on my exposure to the dma-perf application below<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Calibri","sans-serif";color:black'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:black'><Snipped><br><br>On Tue, Dec 12, 2023 at 04:16:20PM +0100, Morten Brørup wrote:<br>> +TO: Bruce, please stop me if I'm completely off track here.<br>><br>> > From: Ferruh Yigit [<a href="mailto:ferruh.yigit@amd.com" id=OWAf90557d8-150f-cb2d-7de8-3c6a7c2889ad>mailto:ferruh.yigit@amd.com</a>] Sent: Tuesday, 12<br>> > December 2023 15.38<br>> ><br>> > On 12/12/2023 11:40 AM, Morten Brørup wrote:<br>> > >> From: Vipin Varghese [<a href="mailto:vipin.varghese@amd.com" id=OWA98c00610-3cd4-e752-037a-17ca12dfc14c>mailto:vipin.varghese@amd.com</a>] Sent: Tuesday,<br>> > >> 12 December 2023 11.38<br>> > >><br>> > >> Replace pktmbuf pool with mempool, this allows increase in MOPS<br>> > >> especially in lower buffer size. Using Mempool, allows to reduce the<br>> > >> extra CPU cycles.<br>> > ><br>> > > I get the point of this change: It tests the performance of copying<br>> > raw memory objects using respectively rte_memcpy and DMA, without the<br>> > mbuf indirection overhead.<br>> > ><br>> > > However, I still consider the existing test relevant: The performance<br>> > of copying packets using respectively rte_memcpy and DMA.<br>> > ><br>> ><br>> > This is DMA performance test application and packets are not used,<br>> > using pktmbuf just introduces overhead to the main focus of the<br>> > application.<br>> ><br>> > I am not sure if pktmuf selected intentionally for this test<br>> > application, but I assume it is there because of historical reasons.<br>><br>> I think pktmbuf was selected intentionally, to provide more accurate<br>> results for application developers trying to determine when to use<br>> rte_memcpy and when to use DMA. Much like the "copy breakpoint" in Linux<br>> Ethernet drivers is used to determine which code path to take for each<br>> received packet.</span><span style='font-family:"Calibri","sans-serif";color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:black'>yes Ferruh, this is the right understanding. In DPDK example we already have </span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:black'>dma-forward application which makes use of pktmbuf payload to copy over</span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:black'>new pktmbuf payload area. </span><o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:black'>by moving to mempool, we are actually now focusing on source and destination buffers.</span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:black'>This allows to create mempool objects with 2MB and 1GB src-dst areas. Thus allowing</span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:black'>to focus src to dst copy. With pktmbuf we were not able to achieve the same.</span><o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'><br>><br>> Most applications will be working with pktmbufs, so these applications<br>> will also experience the pktmbuf overhead. Performance testing with the<br>> same overhead as the application will be better to help the application<br>> developer determine when to use rte_memcpy and when to use DMA when<br>> working with pktmbufs.</span><o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>Morten thank you for the input, but as shared above DPDK example dma-fwd does </span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>justice to such scenario. inline to test-compress-perf & test-crypto-perf IMHO test-dma-perf</span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>should focus on getting best values of dma engine and memcpy comparision.</span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'><br>><br>> (Furthermore, for the pktmbuf tests, I wonder if copying performance<br>> could also depend on IOVA mode and RTE_IOVA_IN_MBUF.)<br>><br>> Nonetheless, there may also be use cases where raw mempool objects are<br>> being copied by rte_memcpy or DMA, so adding tests for these use cases<br>> are useful.<br>><br>><br>> @Bruce, you were also deeply involved in the DMA library, and probably<br>> have more up-to-date practical experience with it. Am I right that<br>> pktmbuf overhead in these tests provides more "real life use"-like<br>> results? Or am I completely off track with my thinking here, i.e. the<br>> pktmbuf overhead is only noise?<br>><br>I'm actually not that familiar with the dma-test application, so can't<br>comment on the specific overhead involved here. In the general case, if we<br>are just talking about the overhead of dereferencing the mbufs then I would<br>expect the overhead to be negligible. However, if we are looking to include<br>the cost of allocation and freeing of buffers, I'd try to avoid that as it<br>is a cost that would have to be paid for both SW copies and HW copies, so<br>should not count when calculating offload cost.</span><o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>Bruce, as per test-dma-perf there is no repeated pktmbuf-alloc or pktmbuf-free. </span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>Hence I disagree that the overhead discussed for pkmbuf here is not related to alloc and free.</span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>But the cost as per my investigation goes into fetching the cacheline and performing mtod on</span><o:p></o:p></p></div><div><p class=MsoNormal style='margin-bottom:12.0pt'><span style='font-size:11.0pt'>each iteration.<br><br>/Bruce</span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>I can rewrite the logic to make use pktmbuf objects by sending the src and dst with pre-computed </span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>mtod to avoid the overhead. But this will not resolve the 2MB and 1GB huge page copy alloc failures.</span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>IMHO, I believe in similar lines to other perf application, dma-perf application should focus on acutal device</span><o:p></o:p></p></div><div><p class=MsoNormal><span style='font-size:11.0pt'>performance over application application performance.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>[MB:]<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>OK, Vipin has multiple good arguments for this patch. I am convinced, let’s proceed with it.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Acked-by: Morten Brørup <mb@smartsharesystems.com><o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p></div></div></div></div></body></html>