Switch aarch64 pause to sb on armv9a+ #2390

Nicoshev · 2025-02-26T21:18:40Z

Summary:
SB (Speculation Barrier) is a modern barrier, mandatory from armv8.5a.
It achieves the same result as issuing DSB+ISB, but without having the cpu drop its instruction pipeline.

We have noticed 20% to 30% increased throughput, on the 16, 32 and 64 thread-count case within the small locks benchmark.

In the below results, 'Sum' is throughput:

before:

------- folly::MicroSpinLock 16 threads
Sum: 130891978 Mean: 1817944 stddev: 147111
Lock time stats in us: mean 1 stddev 33 max 14937
------- folly::MicroSpinLock 32 threads
Sum: 54681548 Mean: 759465 stddev: 105588
Lock time stats in us: mean 5 stddev 78 max 35925
------- folly::MicroSpinLock 64 threads
Sum: 24013546 Mean: 333521 stddev: 90571
Lock time stats in us: mean 11 stddev 179 max 90498

after:

------- folly::MicroSpinLock 16 threads
Sum: 169135465 Mean: 2349103 stddev: 227369
Lock time stats in us: mean 1 stddev 25 max 8463
------- folly::MicroSpinLock 32 threads
Sum: 67853388 Mean: 942408 stddev: 108821
Lock time stats in us: mean 3 stddev 63 max 17020
------- folly::MicroSpinLock 64 threads
Sum: 28845120 Mean: 400626 stddev: 61624
Lock time stats in us: mean 9 stddev 149 max 30879

Reviewed By: Gownta

Differential Revision: D70250662

Summary: SB (Speculation Barrier) is a modern barrier, mandatory from armv8.5a. It achieves the same result as issuing DSB+ISB, but without having the cpu drop its instruction pipeline. We have noticed 20% to 30% increased throughput, on the 16, 32 and 64 thread-count case within the small locks benchmark. In the below results, 'Sum' is throughput: before: ------- folly::MicroSpinLock 16 threads Sum: 130891978 Mean: 1817944 stddev: 147111 Lock time stats in us: mean 1 stddev 33 max 14937 ------- folly::MicroSpinLock 32 threads Sum: 54681548 Mean: 759465 stddev: 105588 Lock time stats in us: mean 5 stddev 78 max 35925 ------- folly::MicroSpinLock 64 threads Sum: 24013546 Mean: 333521 stddev: 90571 Lock time stats in us: mean 11 stddev 179 max 90498 after: ------- folly::MicroSpinLock 16 threads Sum: 169135465 Mean: 2349103 stddev: 227369 Lock time stats in us: mean 1 stddev 25 max 8463 ------- folly::MicroSpinLock 32 threads Sum: 67853388 Mean: 942408 stddev: 108821 Lock time stats in us: mean 3 stddev 63 max 17020 ------- folly::MicroSpinLock 64 threads Sum: 28845120 Mean: 400626 stddev: 61624 Lock time stats in us: mean 9 stddev 149 max 30879 Reviewed By: Gownta Differential Revision: D70250662

facebook-github-bot · 2025-02-26T21:18:59Z

This pull request was exported from Phabricator. Differential Revision: D70250662

Summary: SB (Speculation Barrier) is a modern barrier, mandatory from armv8.5a. It achieves the same result as issuing DSB+ISB, but without having the cpu drop its instruction pipeline. We have noticed 20% to 30% increased throughput, on the 16, 32 and 64 thread-count case within the small locks benchmark. In the below results, 'Sum' is throughput: before: ------- folly::MicroSpinLock 16 threads Sum: 130891978 Mean: 1817944 stddev: 147111 Lock time stats in us: mean 1 stddev 33 max 14937 ------- folly::MicroSpinLock 32 threads Sum: 54681548 Mean: 759465 stddev: 105588 Lock time stats in us: mean 5 stddev 78 max 35925 ------- folly::MicroSpinLock 64 threads Sum: 24013546 Mean: 333521 stddev: 90571 Lock time stats in us: mean 11 stddev 179 max 90498 after: ------- folly::MicroSpinLock 16 threads Sum: 169135465 Mean: 2349103 stddev: 227369 Lock time stats in us: mean 1 stddev 25 max 8463 ------- folly::MicroSpinLock 32 threads Sum: 67853388 Mean: 942408 stddev: 108821 Lock time stats in us: mean 3 stddev 63 max 17020 ------- folly::MicroSpinLock 64 threads Sum: 28845120 Mean: 400626 stddev: 61624 Lock time stats in us: mean 9 stddev 149 max 30879 Reviewed By: Gownta Differential Revision: D70250662

Summary: X-link: facebook/folly#2390 SB (Speculation Barrier) is a modern barrier, mandatory from armv8.5a. It achieves the same result as issuing DSB+ISB, but without having the cpu drop its instruction pipeline. We have noticed 20% to 30% increased throughput, on the 16, 32 and 64 thread-count case within the small locks benchmark. In the below results, 'Sum' is throughput: before: ------- folly::MicroSpinLock 16 threads Sum: 130891978 Mean: 1817944 stddev: 147111 Lock time stats in us: mean 1 stddev 33 max 14937 ------- folly::MicroSpinLock 32 threads Sum: 54681548 Mean: 759465 stddev: 105588 Lock time stats in us: mean 5 stddev 78 max 35925 ------- folly::MicroSpinLock 64 threads Sum: 24013546 Mean: 333521 stddev: 90571 Lock time stats in us: mean 11 stddev 179 max 90498 after: ------- folly::MicroSpinLock 16 threads Sum: 169135465 Mean: 2349103 stddev: 227369 Lock time stats in us: mean 1 stddev 25 max 8463 ------- folly::MicroSpinLock 32 threads Sum: 67853388 Mean: 942408 stddev: 108821 Lock time stats in us: mean 3 stddev 63 max 17020 ------- folly::MicroSpinLock 64 threads Sum: 28845120 Mean: 400626 stddev: 61624 Lock time stats in us: mean 9 stddev 149 max 30879 Reviewed By: Gownta Differential Revision: D70250662

facebook-github-bot · 2025-02-27T11:12:37Z

This pull request has been merged in b22f4c5.

Summary: Pull Request resolved: #9591 X-link: facebook/folly#2390 SB (Speculation Barrier) is a modern barrier, mandatory from armv8.5a. It achieves the same result as issuing DSB+ISB, but without having the cpu drop its instruction pipeline. We have noticed 20% to 30% increased throughput, on the 16, 32 and 64 thread-count case within the small locks benchmark. In the below results, 'Sum' is throughput: before: ------- folly::MicroSpinLock 16 threads Sum: 130891978 Mean: 1817944 stddev: 147111 Lock time stats in us: mean 1 stddev 33 max 14937 ------- folly::MicroSpinLock 32 threads Sum: 54681548 Mean: 759465 stddev: 105588 Lock time stats in us: mean 5 stddev 78 max 35925 ------- folly::MicroSpinLock 64 threads Sum: 24013546 Mean: 333521 stddev: 90571 Lock time stats in us: mean 11 stddev 179 max 90498 after: ------- folly::MicroSpinLock 16 threads Sum: 169135465 Mean: 2349103 stddev: 227369 Lock time stats in us: mean 1 stddev 25 max 8463 ------- folly::MicroSpinLock 32 threads Sum: 67853388 Mean: 942408 stddev: 108821 Lock time stats in us: mean 3 stddev 63 max 17020 ------- folly::MicroSpinLock 64 threads Sum: 28845120 Mean: 400626 stddev: 61624 Lock time stats in us: mean 9 stddev 149 max 30879 Reviewed By: Gownta Differential Revision: D70250662 fbshipit-source-id: 4148100a7368c6dc76c36e2ab88d4d2dabe88904

AGSaidi · 2025-02-28T20:22:14Z

Checked using SB with the lockhammer benchmarks and saw a perf improvement of 11-18%

embg · 2025-02-28T22:52:13Z

@AGSaidi is lockhammer OSS? if so, could you share your code change in lockhammer?

AGSaidi · 2025-02-28T23:41:08Z

Yep https://github.com/ARM-software/synchronization-benchmarks

facebook-github-bot added the CLA Signed label Feb 26, 2025

facebook-github-bot added the fb-exported label Feb 26, 2025

facebook-github-bot closed this in b22f4c5 Feb 27, 2025

facebook-github-bot added the Merged label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch aarch64 pause to sb on armv9a+ #2390

Switch aarch64 pause to sb on armv9a+ #2390

Nicoshev commented Feb 26, 2025

facebook-github-bot commented Feb 26, 2025

facebook-github-bot commented Feb 27, 2025

AGSaidi commented Feb 28, 2025

embg commented Feb 28, 2025

AGSaidi commented Feb 28, 2025

Switch aarch64 pause to sb on armv9a+ #2390

Switch aarch64 pause to sb on armv9a+ #2390

Conversation

Nicoshev commented Feb 26, 2025

facebook-github-bot commented Feb 26, 2025

facebook-github-bot commented Feb 27, 2025

AGSaidi commented Feb 28, 2025

embg commented Feb 28, 2025

AGSaidi commented Feb 28, 2025