[dpdk-users] scheduler issue

Alex Kiselev alex at therouter.net
Sat Dec 12 18:19:56 CET 2020


On 2020-12-12 11:46, Alex Kiselev wrote:
> On 2020-12-12 11:22, Singh, Jasvinder wrote:
>>> On 12 Dec 2020, at 01:45, Alex Kiselev <alex at therouter.net> wrote:
>>> 
>>> On 2020-12-12 01:54, Alex Kiselev wrote:
>>>>> On 2020-12-12 01:45, Alex Kiselev wrote:
>>>>> On 2020-12-12 01:20, Singh, Jasvinder wrote:
>>>>>>> On 11 Dec 2020, at 23:37, Alex Kiselev <alex at therouter.net> 
>>>>>>> wrote:
>>>>>>> On 2020-12-11 23:55, Singh, Jasvinder wrote:
>>>>>>> On 11 Dec 2020, at 22:27, Alex Kiselev <alex at therouter.net> 
>>>>>>> wrote:
>>>>>>>> On 2020-12-11 23:06, Singh, Jasvinder wrote:
>>>>>>> On 11 Dec 2020, at 21:29, Alex Kiselev <alex at therouter.net> 
>>>>>>> wrote:
>>>>>>> On 2020-12-08 14:24, Singh, Jasvinder wrote:
>>>>>>> <snip>
>>>>>>>> [JS] now, returning to 1 mbps pipes situation, try reducing tc
>>>>>>> period
>>>>>>>> first at subport and then at  pipe level, if that help in 
>>>>>>>> getting
>>>>>>> even
>>>>>>>> traffic across low bandwidth pipes.
>>>>>>> reducing subport tc from 10 to 5 period also solved the problem
>>>>>>> with 1
>>>>>>> Mbit/s pipes.
>>>>>>> so, my second problem has been solved,
>>>>>>> but the first one with some of low bandwidth pipes stop
>>>>>>> transmitting still
>>>>>>> remains.
>>>>>>> I see, try removing "pkt_len <= pipe_tc_ov_credits" condition in
>>>>>>> the
>>>>>>> grinder_credits_check() code for oversubscription case, instead 
>>>>>>> use
>>>>>>> this pkt_len <= pipe_tc_credits + pipe_tc_ov_credits;
>>>>>>> if I do what you suggest, I will get this code
>>>>>>> enough_credits = (pkt_len <= subport_tb_credits) &&
>>>>>>> (pkt_len <= subport_tc_credits) &&
>>>>>>> (pkt_len <= pipe_tb_credits) &&
>>>>>>> (pkt_len <= pipe_tc_credits) &&
>>>>>>> (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits);
>>>>>>> And this doesn't make sense since if condition pkt_len <=
>>>>>>> pipe_tc_credits is true
>>>>>>> then condition (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits) 
>>>>>>> is
>>>>>>> also always true.
>>>>>>> [JS] my suggestion is to remove“pkt_len <= pipe_tc_credits“,
>>>>>>> “ pkt_len
>>>>>>> <= pipe_tc_ov_credits”and use only “pkt_len <= pipe_tc_credits
>>>>>>> +
>>>>>>> pipe_tc_ov_credits“
>>>>>>> While keeping tc_ov flag on.
>>>>>>> Your suggestion just turns off TC_OV feature.
>>>>>>>> I don't see your point.
>>>>>>>> This new suggestion will also effectively turn off the TC_OV
>>>>>>>> feature since
>>>>>>>> the only effect of enabling TC_OV is adding additional condition
>>>>>>>> pkt_len <= pipe_tc_ov_credits
>>>>>>>> which doesn't allow a pipe to spend more resources than it 
>>>>>>>> should.
>>>>>>>> And in the case of support congestion a pipe should spent less
>>>>>>>> than %100 of pipe's maximum rate.
>>>>>>>> And you suggest to allow pipe to spend 100% of it's rate plus 
>>>>>>>> some
>>>>>>>> extra.
>>>>>>>> I guess effect of this would even more unfair support's 
>>>>>>>> bandwidth
>>>>>>>> distibution.
>>>>>>>> Btw, a pipe might stop transmitting even when there is no
>>>>>>>> congestion at a subport.
>>>>>>> Although I didn’t try this solution but the idea here is - in a
>>>>>>> particular round, of pkt_len is less than pipe_tc_credits( which 
>>>>>>> is
>>>>>>> a
>>>>>>> constant value each time) but greater than pipe_tc_ov_credits, 
>>>>>>> then
>>>>>>> it
>>>>>>> might hit the situation when no packet will be scheduled from the
>>>>>>> pipe
>>>>>>> even though there are fixed credits greater than packet size is
>>>>>>> available.
>>>>>> But that is a perfectly normal situation and that's exactly the 
>>>>>> idea
>>>>>> behind TC_OV.
>>>>>> It means a pipe should wait for the next subport->tc_ov_period_id
>>>>>> when pipe_tc_ov_credits will be reset to a new value
>>>>>> But here it’s not guaranteed that new value of pipe_tc_ov_credits
>>>>>> will be sufficient for low bandwidth pipe to send their packets as
>>>>>> each time pipe_tc_ov_credits is freshly computed.
>>>>>>> pipe->tc_ov_credits = subport->tc_ov_wm * params->tc_ov_weight;
>>>>>>> which allows the pipe to continue transmitting.
>>>>>> No that won’t happen if new tc_ov_credits value is again less than
>>>>>> pkt_len and will hit deadlock.
>>>>> new tc_ov_credits can't not be less than subport->tc_ov_wm_min,
>>>>> and tc_ov_wm_min is equal to port->mtu.
>>>>> all my scheduler ports configured with mtu 1522. etherdev ports 
>>>>> also uses
>>>>> the same mtu, therefore there should be no packets bigger that 
>>>>> 1522.
>>>> also, tc_ov_credits is set to tc_ov_wm_min only in the case of 
>>>> constant
>>>> congestion and today I detected the problem when there was no 
>>>> congestion.
>>>> so, it's highly unlikely that tc_ov_credits is always set to a value
>>>> less than pkt_size. The only scenario in which this might be the 
>>>> case is
>>>> when scheduler port get a corrupted mbuf with incorrect pkt len
>>>> which cause a queue deadlock.
>>> 
>>> also, a defragmented ipv4 packet (multisegment mbuf) might have 
>>> pkt_len much bigger
>>> then scheduler port's MTU, therefore you are right, there is 
>>> absolutely no guarantee
>>> that packet will not cause queue's deadlock. and this explanation 
>>> sounds very plausible
>>> to me and I bet this is my case.
>>> 
>> 
>> But you mentioned earlier that your packet length is low, never
>> exceeding above threshold. May be test with fixed 256/512 bytes size
>> pkt if faces the same no transmission situation.
> 
> No, I could only say so about the test lab which I were using
> to test the pipe fairness and which uses a packet generator with
> constant pkt size.
> 
> My main issue happens in a productive network. And I mentioned
> that it is a network to provide internet access to residential
> customers therefore packet sizes are up to 1522 bytes. Also,
> fragmented packets are valid packets in such networks. My application
> performs ipv4 defragmentation and then sends to the scheduler, so the 
> scheduler
> might receive the multisegment packet up to 1522 * 8 size.

I've tested your patch.

         /* Check pipe and subport credits */
         enough_credits = (pkt_len <= subport_tb_credits) &&
                 (pkt_len <= subport_tc_credits) &&
                 (pkt_len <= pipe_tb_credits) &&
                 (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits);

and the effect is quite positive.

Also, I moved the packet defragmentation block before the scheduler in 
my app.
It will guarantee that all packets entering the scheduler have a size no 
more than 1522.
Thus pipe_tc_ov_credits will always be greater than pkt_size.

This build is gonna be tested in production in a couple of days.
I'll let you know about the results.

> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>>>> Maybe I should increase port's MTU? to 1540?
>>>>>>> And it could not cause a permanent pipe stop which is what I am
>>>>>>> facing.
>>>>>>>> In fairness, pipe should send the as much as packets which
>>>>>>>> consumes pipe_tc_credits, regardless of extra pipe_tc_ov_credits
>>>>>>>> which
>>>>>>>> is extra on top of pipe_tc_credits.
>>>>>>> I think it's quite the opposite. That's why after I reduced the
>>>>>>> support tc_period
>>>>>>> I got much more fairness. Since reducing subport tc_period also
>>>>>>> reduce the tc_ov_wm_max value.
>>>>>>> s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(params->tc_period,
>>>>>>> port->pipe_tc3_rate_max)
>>>>>>> as a result a pipe transmits less bytes in one round. so pipe
>>>>>>> rotation inside a grinder
>>>>>>> happens much more often and a pipe can't monopolise resources.
>>>>>>> in other sos implementation this is called "quantum".
>>>>>> Yes, so reducing tc period makes the case when all pipes ( high n 
>>>>>> low
>>>>>> bandwidth) gets lower values of  tc_ov_credits  values which allow
>>>>>> lesser transmission from higher bw pipes and leave bandwidth for 
>>>>>> low
>>>>>> bw pipes. So, here is the thing- Either tune tc period to a value
>>>>>> which prevent high bw pipe hogging most of bw or makes changes in 
>>>>>> the
>>>>>> code, where oversubscription add extra credits on top of 
>>>>>> guaranteed.
>>>>>> One question, don’t your low bw pipes have higher priority traffic
>>>>>> tc0, tc1, tc2 . Packets from those tc must be going out. Isn’t 
>>>>>> this
>>>>>> the case ?
>>>>> well, it would be the case after I find out
>>>>> what's going on. Right now I am using a tos2tc map configured
>>>>> in such a way that all ipv4 packets with any TOS values
>>>>> goes into TC3.
>>>>>>>>>> rcv 0   rx rate 7324160 nb pkts 5722
>>>>>>>>>> rcv 1   rx rate 7281920 nb pkts 5689
>>>>>>>>>> rcv 2   rx rate 7226880 nb pkts 5646
>>>>>>>>>> rcv 3   rx rate 7124480 nb pkts 5566
>>>>>>>>>> rcv 4   rx rate 7324160 nb pkts 5722
>>>>>>>>>> rcv 5   rx rate 7271680 nb pkts 5681
>>>>>>>>>> rcv 6   rx rate 7188480 nb pkts 5616
>>>>>>>>>> rcv 7   rx rate 7150080 nb pkts 5586
>>>>>>>>>> rcv 8   rx rate 7328000 nb pkts 5725
>>>>>>>>>> rcv 9   rx rate 7249920 nb pkts 5664
>>>>>>>>>> rcv 10  rx rate 7188480 nb pkts 5616 rcv 11  rx rate 7179520 
>>>>>>>>>> nb
>>>>>>> pkts
>>>>>>>>>> 5609 rcv 12  rx rate 7324160 nb pkts 5722 rcv 13  rx rate
>>>>>>> 7208960 nb
>>>>>>>>>> pkts 5632 rcv 14  rx rate 7152640 nb pkts 5588 rcv 15  rx rate
>>>>>>>>>> 7127040 nb pkts 5568 rcv 16  rx rate 7303680 nb pkts 5706 ....
>>>>>>>>>> rcv 587 rx rate 2406400 nb pkts 1880 rcv 588 rx rate 2406400 
>>>>>>>>>> nb
>>>>>>> pkts
>>>>>>>>>> 1880 rcv 589 rx rate 2406400 nb pkts 1880 rcv 590 rx rate
>>>>>>> 2406400 nb
>>>>>>>>>> pkts 1880 rcv 591 rx rate 2406400 nb pkts 1880 rcv 592 rx rate
>>>>>>>>>> 2398720 nb pkts 1874 rcv 593 rx rate 2400000 nb pkts 1875 rcv
>>>>>>> 594 rx
>>>>>>>>>> rate 2400000 nb pkts 1875 rcv 595 rx rate 2400000 nb pkts 1875
>>>>>>> rcv
>>>>>>>>>> 596 rx rate 2401280 nb pkts 1876 rcv 597 rx rate 2401280 nb
>>>>>>> pkts
>>>>>>>>>> 1876 rcv 598 rx rate 2401280 nb pkts 1876 rcv 599 rx rate
>>>>>>> 2402560 nb
>>>>>>>>>> pkts 1877 rx rate sum 3156416000
>>>>>>>>>>>> ... despite that there is _NO_ congestion...
>>>>>>>>>>>> congestion at the subport or pipe.
>>>>>>>>>>>>> And the subport !! doesn't use about 42 mbit/s of available
>>>>>>>>>>>>> bandwidth.
>>>>>>>>>>>>> The only difference is those test configurations is TC of
>>>>>>>>>>>>> generated traffic.
>>>>>>>>>>>>> Test 1 uses TC 1 while test 2 uses TC 3 (which is use TC_OV
>>>>>>>>>>>>> function).
>>>>>>>>>>>>> So, enabling TC_OV changes the results dramatically.
>>>>>>>>>>>>> ##
>>>>>>>>>>>>> ## test1
>>>>>>>>>>>>> ##
>>>>>>>>>>>>> hqos add profile  7 rate    2 M size 1000000 tc period 40
>>>>>>>>>>>>> # qos test port
>>>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue
>>>>>>> sizes
>>>>>>>>>>>>> 64 64 64 64
>>>>>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period
>>>>>>> 10
>>>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add 
>>>>>>>>>>>>> port
>>>>>>> 1
>>>>>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port
>>>>>>> 1
>>>>>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate
>>>>>>> 1M TC
>>>>>>>>>>>>> 1 ...
>>>>>>>>>>>>> rcv 284 rx rate 995840  nb pkts 778 rcv 285 rx rate 995840
>>>>>>> nb
>>>>>>>>>>>>> pkts 778 rcv 286 rx rate 995840  nb pkts 778 rcv 287 rx 
>>>>>>>>>>>>> rate
>>>>>>>>>>>>> 995840  nb pkts 778 rcv 288 rx rate 995840  nb pkts 778 rcv
>>>>>>> 289
>>>>>>>>>>>>> rx rate 995840  nb pkts 778 rcv 290 rx rate 995840  nb pkts
>>>>>>> 778
>>>>>>>>>>>>> rcv 291 rx rate 995840  nb pkts 778 rcv 292 rx rate 995840
>>>>>>> nb
>>>>>>>>>>>>> pkts 778 rcv 293 rx rate 995840  nb pkts 778 rcv 294 rx 
>>>>>>>>>>>>> rate
>>>>>>>>>>>>> 995840  nb pkts 778 ...
>>>>>>>>>>>>> sum pipe's rx rate is 298 494 720 OK.
>>>>>>>>>>>>> The subport rate is equally distributed to 300 pipes.
>>>>>>>>>>>>> ##
>>>>>>>>>>>>> ##  test 2
>>>>>>>>>>>>> ##
>>>>>>>>>>>>> hqos add profile  7 rate    2 M size 1000000 tc period 40
>>>>>>>>>>>>> # qos test port
>>>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue
>>>>>>> sizes
>>>>>>>>>>>>> 64 64 64 64
>>>>>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period
>>>>>>> 10
>>>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add 
>>>>>>>>>>>>> port
>>>>>>> 1
>>>>>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port
>>>>>>> 1
>>>>>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate
>>>>>>> 1M TC
>>>>>>>>>>>>> 3
>>>>>>>>>>>>> h5 ~ # rcli sh qos rcv
>>>>>>>>>>>>> rcv 0   rx rate 875520  nb pkts 684
>>>>>>>>>>>>> rcv 1   rx rate 856320  nb pkts 669
>>>>>>>>>>>>> rcv 2   rx rate 849920  nb pkts 664
>>>>>>>>>>>>> rcv 3   rx rate 853760  nb pkts 667
>>>>>>>>>>>>> rcv 4   rx rate 867840  nb pkts 678
>>>>>>>>>>>>> rcv 5   rx rate 844800  nb pkts 660
>>>>>>>>>>>>> rcv 6   rx rate 852480  nb pkts 666
>>>>>>>>>>>>> rcv 7   rx rate 855040  nb pkts 668
>>>>>>>>>>>>> rcv 8   rx rate 865280  nb pkts 676
>>>>>>>>>>>>> rcv 9   rx rate 846080  nb pkts 661
>>>>>>>>>>>>> rcv 10  rx rate 858880  nb pkts 671 rcv 11  rx rate 870400
>>>>>>> nb
>>>>>>>>>>>>> pkts 680 rcv 12  rx rate 864000  nb pkts 675 rcv 13  rx 
>>>>>>>>>>>>> rate
>>>>>>>>>>>>> 852480  nb pkts 666 rcv 14  rx rate 855040  nb pkts 668 rcv
>>>>>>> 15
>>>>>>>>>>>>> rx rate 857600  nb pkts 670 rcv 16  rx rate 864000  nb pkts
>>>>>>> 675
>>>>>>>>>>>>> rcv 17  rx rate 866560  nb pkts 677 rcv 18  rx rate 865280
>>>>>>> nb
>>>>>>>>>>>>> pkts 676 rcv 19  rx rate 858880  nb pkts 671 rcv 20  rx 
>>>>>>>>>>>>> rate
>>>>>>>>>>>>> 856320  nb pkts 669 rcv 21  rx rate 864000  nb pkts 675 rcv
>>>>>>> 22
>>>>>>>>>>>>> rx rate 869120  nb pkts 679 rcv 23  rx rate 856320  nb pkts
>>>>>>> 669
>>>>>>>>>>>>> rcv 24  rx rate 862720  nb pkts 674 rcv 25  rx rate 865280
>>>>>>> nb
>>>>>>>>>>>>> pkts 676 rcv 26  rx rate 867840  nb pkts 678 rcv 27  rx 
>>>>>>>>>>>>> rate
>>>>>>>>>>>>> 870400  nb pkts 680 rcv 28  rx rate 860160  nb pkts 672 rcv
>>>>>>> 29
>>>>>>>>>>>>> rx rate 870400  nb pkts 680 rcv 30  rx rate 869120  nb pkts
>>>>>>> 679
>>>>>>>>>>>>> rcv 31  rx rate 870400  nb pkts 680 rcv 32  rx rate 858880
>>>>>>> nb
>>>>>>>>>>>>> pkts 671 rcv 33  rx rate 858880  nb pkts 671 rcv 34  rx 
>>>>>>>>>>>>> rate
>>>>>>>>>>>>> 852480  nb pkts 666 rcv 35  rx rate 874240  nb pkts 683 rcv
>>>>>>> 36
>>>>>>>>>>>>> rx rate 855040  nb pkts 668 rcv 37  rx rate 853760  nb pkts
>>>>>>> 667
>>>>>>>>>>>>> rcv 38  rx rate 869120  nb pkts 679 rcv 39  rx rate 885760
>>>>>>> nb
>>>>>>>>>>>>> pkts 692 rcv 40  rx rate 861440  nb pkts 673 rcv 41  rx 
>>>>>>>>>>>>> rate
>>>>>>>>>>>>> 852480  nb pkts 666 rcv 42  rx rate 871680  nb pkts 681 ...
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> rcv 288 rx rate 766720  nb pkts 599 rcv 289 rx rate 766720
>>>>>>> nb
>>>>>>>>>>>>> pkts 599 rcv 290 rx rate 766720  nb pkts 599 rcv 291 rx 
>>>>>>>>>>>>> rate
>>>>>>>>>>>>> 766720  nb pkts 599 rcv 292 rx rate 762880  nb pkts 596 rcv
>>>>>>> 293
>>>>>>>>>>>>> rx rate 762880  nb pkts 596 rcv 294 rx rate 762880  nb pkts
>>>>>>> 596
>>>>>>>>>>>>> rcv 295 rx rate 760320  nb pkts 594 rcv 296 rx rate 604160
>>>>>>> nb
>>>>>>>>>>>>> pkts 472 rcv 297 rx rate 604160  nb pkts 472 rcv 298 rx 
>>>>>>>>>>>>> rate
>>>>>>>>>>>>> 604160  nb pkts 472 rcv 299 rx rate 604160  nb pkts 472 rx
>>>>>>> rate
>>>>>>>>>>>>> sum 258839040 FAILED.
>>>>>>>>>>>>> The subport rate is distributed NOT equally between 300
>>>>>>> pipes.
>>>>>>>>>>>>> Some subport bandwith (about 42) is not being used!


More information about the users mailing list