[dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation

Phil Yang (Arm Technology China) Phil.Yang at arm.com
Thu Jun 6 12:17:01 CEST 2019

Previous message: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
Next message: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: David Marchand <david.marchand at redhat.com>
Sent: Thursday, June 6, 2019 12:30 AM
To: Phil Yang (Arm Technology China) <Phil.Yang at arm.com>
Cc: dev <dev at dpdk.org>; thomas at monjalon.net; jerinj at marvell.com; hemant.agrawal at nxp.com; Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; Gavin Hu (Arm Technology China) <Gavin.Hu at arm.com>; nd <nd at arm.com>
Subject: Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation



On Wed, Jun 5, 2019 at 6:00 PM Phil Yang <phil.yang at arm.com<mailto:phil.yang at arm.com>> wrote:
This patch set added MCS lock library and its unit test.

The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT) provides
scalability by spinning on a CPU/thread local variable which avoids expensive
cache bouncings. It provides fairness by maintaining a list of acquirers and
passing the lock to each CPU/thread in the order they acquired the lock.

References:
1. http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf
2. https://lwn.net/Articles/590243/

Mirco-benchmarking result:
------------------------------------------------------------------------------------------------
MCS lock                      | spinlock                       | ticket lock
------------------------------+--------------------------------+--------------------------------
Test with lock on 13 cores... |  Test with lock on 14 cores... |  Test with lock on 14 cores...
Core [15] Cost Time = 22426 us|  Core [14] Cost Time = 47974 us|  Core [14] cost time = 66761 us
Core [16] Cost Time = 22382 us|  Core [15] Cost Time = 46979 us|  Core [15] cost time = 66766 us
Core [17] Cost Time = 22294 us|  Core [16] Cost Time = 46044 us|  Core [16] cost time = 66761 us
Core [18] Cost Time = 22412 us|  Core [17] Cost Time = 28793 us|  Core [17] cost time = 66767 us
Core [19] Cost Time = 22407 us|  Core [18] Cost Time = 48349 us|  Core [18] cost time = 66758 us
Core [20] Cost Time = 22436 us|  Core [19] Cost Time = 19381 us|  Core [19] cost time = 66766 us
Core [21] Cost Time = 22414 us|  Core [20] Cost Time = 47914 us|  Core [20] cost time = 66763 us
Core [22] Cost Time = 22405 us|  Core [21] Cost Time = 48333 us|  Core [21] cost time = 66766 us
Core [23] Cost Time = 22435 us|  Core [22] Cost Time = 38900 us|  Core [22] cost time = 66749 us
Core [24] Cost Time = 22401 us|  Core [23] Cost Time = 45374 us|  Core [23] cost time = 66765 us
Core [25] Cost Time = 22408 us|  Core [24] Cost Time = 16121 us|  Core [24] cost time = 66762 us
Core [26] Cost Time = 22380 us|  Core [25] Cost Time = 42731 us|  Core [25] cost time = 66768 us
Core [27] Cost Time = 22395 us|  Core [26] Cost Time = 29439 us|  Core [26] cost time = 66768 us
                              |  Core [27] Cost Time = 38071 us|  Core [27] cost time = 66767 us
------------------------------+--------------------------------+--------------------------------
Total Cost Time = 291195 us   |  Total Cost Time = 544403 us   |  Total cost time = 934687 us
------------------------------------------------------------------------------------------------

Had a quick look, interesting.

Hi David,

Thanks for your comments.

Quick comments:
- your numbers are for 13 cores, while the other are for 14, what is the reason?
[Phil]The test case skipped the master thread while doing the load test. The master thread just controls the trigger. So all the other threads acquiring the lock and running the same workload at the same time.
Actually, there is no difference on per core performance when it involved the master thread in the load test.

- do we need per architecture header? all I can see is generic code, we might as well directly put rte_mcslock.h in the common/include directory.
[Phil] I just trying to leave it for architecture specific optimization.

- could we replace the current spinlock with this approach? is this more expensive than spinlock on lowly contended locks? is there a reason we want to keep all these approaches? we could have now 3 lock implementations.
[Phil] Under the high lock contention scenarios, MCS is much better than spinlock. However, MCS lock is more complicated than spinlock and more expensive than spinlock in the single thread scenario. E.g:
Test with lock on single core..
MCS lock :
Core [14] Cost Time = 327 us

Spinlock:
Core [14] Cost Time = 258 us

ticket lock:
Core [14] cost time = 195 us
I think in low-contention scenarios but you still need mutual exclusion you can use spinlock. It is lighter. I think that all depends on the application.

- do we need to write the authors names in full capitalized version? it seems like you are shouting :-)
[Phil] :-)  I will modify it in the next version. Thanks.


--
David Marchand


Thanks,
Phil Yang

Previous message: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
Next message: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list