[dpdk-dev] Fwd: high latency detected in IP pipeline example

Victor Huertas vhuertas at gmail.com
Wed Feb 19 09:29:56 CET 2020
Previous message: [dpdk-dev] Fwd: high latency detected in IP pipeline example
Next message: [dpdk-dev] Fwd: Fwd: high latency detected in IP pipeline example
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
OK James,
Thanks for sharing your own experience.
What I would need right now is to know from maintainers if this latency
behaviour is something inherent in DPDK  in the particular case we are
talking about. Furthermore, I would also appreciate it if some maintainer
could tell us if there is some workaround or special configuration that
completely mitigate this latency. I guess that there is one mitigation
mechanism, which is the approach that the new ip_pipeline app example
exposes: if two or more pipelines are in the same core the "connection"
between them is not a software queue but a "direct table connection".

This proposed approach has a big impact on my application and I would like
to know if there is other mitigation approach taking into account the "old"
version of ip_pipeline example.

Thanks for your attention


El mar., 18 feb. 2020 a las 23:09, James Huang (<jamsphon at gmail.com>)
escribió:

> No. I didn't notice the RTT bouncing symptoms.
> In high throughput scenario, if multiple pipelines runs in a single cpu
> core, it does increase the latency.
>
>
> Regards,
> James Huang
>
>
> On Tue, Feb 18, 2020 at 1:50 AM Victor Huertas <vhuertas at gmail.com> wrote:
>
>> Dear James,
>>
>> I have done two different tests with the following configuration:
>> [PIPELINE 0 MASTER core =0]
>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>> [PIPELINE 3 core=1]
>>
>> The first test (sending a single ping to cross all the pipelines to
>> measure RTT) has been done by setting the burst_write to 32 in SWQ1 and
>> SWQ2. NOTE: All the times we use rte_ring_enqueue_burst in the pipelines 1
>> and 2 we set the number of packets to write to 1.
>>
>> The result of this first test is as shown subsquently:
>> 64 bytes from 192.168.0.101: icmp_seq=343 ttl=63 time=59.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=344 ttl=63 time=59.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=345 ttl=63 time=59.2 ms
>> 64 bytes from 192.168.0.101: icmp_seq=346 ttl=63 time=59.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=347 ttl=63 time=59.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=348 ttl=63 time=59.2 ms
>> 64 bytes from 192.168.0.101: icmp_seq=349 ttl=63 time=59.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=350 ttl=63 time=59.1 ms
>> 64 bytes from 192.168.0.101: icmp_seq=351 ttl=63 time=58.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=352 ttl=63 time=58.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=353 ttl=63 time=58.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=354 ttl=63 time=58.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=355 ttl=63 time=58.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=356 ttl=63 time=57.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=357 ttl=63 time=56.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=358 ttl=63 time=57.2 ms
>> 64 bytes from 192.168.0.101: icmp_seq=359 ttl=63 time=57.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=360 ttl=63 time=57.3 ms
>>
>> As you can see, the RTT is quite high and the range of values is more or
>> less stable.
>>
>> The second test is the same as the first one but setting burst_write to 1
>> for all SWQs. The result is this one:
>>
>> 64 bytes from 192.168.0.101: icmp_seq=131 ttl=63 time=10.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=132 ttl=63 time=10.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=133 ttl=63 time=10.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=134 ttl=63 time=10.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=135 ttl=63 time=10.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=136 ttl=63 time=10.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=137 ttl=63 time=10.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=138 ttl=63 time=10.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=139 ttl=63 time=10.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=140 ttl=63 time=10.2 ms
>> 64 bytes from 192.168.0.101: icmp_seq=141 ttl=63 time=10.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=142 ttl=63 time=10.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=143 ttl=63 time=11.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=144 ttl=63 time=11.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=145 ttl=63 time=11.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=146 ttl=63 time=11.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=147 ttl=63 time=11.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=148 ttl=63 time=11.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=149 ttl=63 time=12.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=150 ttl=63 time=12.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=151 ttl=63 time=12.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=152 ttl=63 time=12.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=153 ttl=63 time=12.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=154 ttl=63 time=12.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=155 ttl=63 time=12.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=156 ttl=63 time=12.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=157 ttl=63 time=12.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=158 ttl=63 time=12.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=159 ttl=63 time=13.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=160 ttl=63 time=13.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=161 ttl=63 time=13.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=162 ttl=63 time=13.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=163 ttl=63 time=13.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=164 ttl=63 time=13.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=165 ttl=63 time=13.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=166 ttl=63 time=13.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=167 ttl=63 time=14.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=168 ttl=63 time=14.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=169 ttl=63 time=14.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=170 ttl=63 time=14.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=171 ttl=63 time=14.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=172 ttl=63 time=14.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=173 ttl=63 time=14.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=174 ttl=63 time=14.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=175 ttl=63 time=15.1 ms
>> 64 bytes from 192.168.0.101: icmp_seq=176 ttl=63 time=15.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=177 ttl=63 time=16.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=178 ttl=63 time=16.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=179 ttl=63 time=17.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=180 ttl=63 time=17.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=181 ttl=63 time=17.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=182 ttl=63 time=17.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=183 ttl=63 time=18.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=184 ttl=63 time=18.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=185 ttl=63 time=19.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=186 ttl=63 time=19.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=187 ttl=63 time=10.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=188 ttl=63 time=10.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=189 ttl=63 time=10.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=190 ttl=63 time=10.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=191 ttl=63 time=10.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=192 ttl=63 time=10.7 ms
>> As you mentioned, the delay has decreased a lot but it is still
>> considerably high (in a normal router this delay is less than 1 ms).
>> A second strange behaviour is seen in the evolution of the RTT detected.
>> It begins in 10 ms and goes increasing little by litttle to reach a peak of
>> 20 ms aprox and then it suddely comes back to 10 ms again to increase again
>> till 20 ms.
>>
>> Is this the behaviour you have in your case when the burst_write is set
>> to 1?
>>
>> Regards,
>>
>> El mar., 18 feb. 2020 a las 8:18, James Huang (<jamsphon at gmail.com>)
>> escribió:
>>
>>> No. We didn't see noticable throughput difference in our test.
>>>
>>> On Mon., Feb. 17, 2020, 11:04 p.m. Victor Huertas <vhuertas at gmail.com>
>>> wrote:
>>>
>>>> Thanks James for your quick answer.
>>>> I guess that this configuration modification implies that the packets
>>>> must be written one by one in the sw ring. Did you notice loose of
>>>> performance (in throughput) in your aplicación because of that?
>>>>
>>>> Regards
>>>>
>>>> El mar., 18 feb. 2020 0:10, James Huang <jamsphon at gmail.com> escribió:
>>>>
>>>>> Yes, I experienced similar issue in my application. In a short answer,
>>>>> set the swqs write burst value to 1 may reduce the latency significantly.
>>>>> The default write burst value is 32.
>>>>>
>>>>> On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <vhuertas at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am developing my own DPDK application basing it in the dpdk-stable
>>>>>> ip_pipeline example.
>>>>>> At this moment I am using the 17.11 LTS version of DPDK and I amb
>>>>>> observing
>>>>>> some extrange behaviour. Maybe it is an old issue that can be solved
>>>>>> quickly so I would appreciate it if some expert can shade a light on
>>>>>> this.
>>>>>>
>>>>>> The ip_pipeline example allows you to develop Pipelines that perform
>>>>>> specific packet processing functions (ROUTING, FLOW_CLASSIFYING,
>>>>>> etc...).
>>>>>> The thing is that I am extending some of this pipelines with my own.
>>>>>> However I want to take advantage of the built-in ip_pipeline
>>>>>> capability of
>>>>>> arbitrarily assigning the logical core where the pipeline (f_run()
>>>>>> function) must be executed so that i can adapt the packet processing
>>>>>> power
>>>>>> to the amount of the number of cores available.
>>>>>> Taking this into account I have observed something strange. I show
>>>>>> you this
>>>>>> simple example below.
>>>>>>
>>>>>> Case 1:
>>>>>> [PIPELINE 0 MASTER core =0]
>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
>>>>>> [PIPELINE 3 core=3]
>>>>>>
>>>>>> Case 2:
>>>>>> [PIPELINE 0 MASTER core =0]
>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>>>>>> [PIPELINE 3 core=1]
>>>>>>
>>>>>> I send a ping between two hosts connected at both sides of the
>>>>>> pipeline
>>>>>> model which allows these pings to cross all the pipelines (from 1 to
>>>>>> 3).
>>>>>> What I observe in Case 1 (each pipeline has its own thread in
>>>>>> different
>>>>>> core) is that the reported RTT is less than 1 ms, whereas in Case 2
>>>>>> (all
>>>>>> pipelines except MASTER are run in the same thread) is 20 ms.
>>>>>> Furthermore,
>>>>>> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate
>>>>>> this RTT
>>>>>> decreases to 3 or 4 ms.
>>>>>>
>>>>>> Has somebody observed this behaviour in the past? Can it be solved
>>>>>> somehow?
>>>>>>
>>>>>> Thanks a lot for your attention
>>>>>> --
>>>>>> Victor
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Victor
>>>>>>
>>>>>
>>
>> --
>> Victor
>>
>

-- 
Victor
Previous message: [dpdk-dev] Fwd: high latency detected in IP pipeline example
Next message: [dpdk-dev] Fwd: Fwd: high latency detected in IP pipeline example
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list