Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benches: move sender to a spawned task in watch benchmark #6034

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

satakuma
Copy link
Member

Motivation

The PR #5464 introduced a new bucket notifier to reduce contention in the watch channel. That change was motivated by the benchmark added in #5472. However, this benchmark does not present favorable results for the newer notifier if we compare it with the older solution.

Results with the new notifier:
contention_resubscribe time: [81.507 ms 82.098 ms 82.740 ms]
Results without it:
contention_resubscribe time: [38.254 ms 38.938 ms 39.618 ms]

This is most likely due to sender task not running on a worker thread.

Solution

Change the existing watch channel benchmark by moving the benchmarked loop to a spawned task, which will be executed by one of the worker threads. With this change the results are more comparable:

new notifier:

contention_resubscribe/10    time:   [930.74 µs 937.37 µs 942.98 µs]
contention_resubscribe/100   time:   [3.8498 ms 3.8565 ms 3.8634 ms]
contention_resubscribe/500   time:   [16.106 ms 16.111 ms 16.116 ms]
contention_resubscribe/1000  time:   [33.519 ms 33.881 ms 34.250 ms]

old notifier:

contention_resubscribe/10    time:   [901.93 µs 914.50 µs 925.82 µs]
contention_resubscribe/100   time:   [4.0990 ms 4.1044 ms 4.1106 ms]
contention_resubscribe/500   time:   [16.800 ms 16.810 ms 16.823 ms]
contention_resubscribe/1000  time:   [32.479 ms 32.497 ms 32.516 ms]

Note that since the addition of the new notifier, some optimizations (#5503) have been made to the underlying Notify primitive. Those also reduced lock contention so the new bucket solution no longer presents as significant improvements (or any improvement at all) in this benchmark as it might at the time of its addition. I still think it's worth keeping around as it improves performance with more worker threads. Here are the results using 24 worker threads (all of the above results were obtained using 6):

new notifier:

contention_resubscribe/10    time:   [881.98 µs 893.00 µs 902.81 µs]
contention_resubscribe/100   time:   [3.6447 ms 3.6536 ms 3.6621 ms]
contention_resubscribe/500   time:   [21.162 ms 21.317 ms 21.486 ms]
contention_resubscribe/1000  time:   [43.518 ms 43.927 ms 44.346 ms]

old notifier:

contention_resubscribe/10    time:   [930.67 µs 932.05 µs 933.49 µs]
contention_resubscribe/100   time:   [4.5549 ms 4.7225 ms 4.8852 ms]
contention_resubscribe/500   time:   [41.734 ms 41.789 ms 41.845 ms]
contention_resubscribe/1000  time:   [86.575 ms 86.634 ms 86.694 ms]

For benchmarking, I used a 12-core x86_64 machine running Linux 6.5.4.

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-sync Module: tokio/sync A-benches Area: Benchmarks labels Sep 26, 2023
Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting observation. Thanks. What do you think we should do about the less-contention change?

@Darksonn Darksonn merged commit ca89c5b into tokio-rs:master Sep 28, 2023
73 checks passed
@Darksonn
Copy link
Contributor

After all, there's also a disadvantage to having it, namely that it makes the allocation a lot larger.

@satakuma
Copy link
Member Author

satakuma commented Oct 1, 2023

I ran the original benchmark from the thread in #5403 (tijsvd/tokio-watch-benchmark) and the results don't seem to favor BigNotify either, although they vary quite a bit from run to run.

new notifier (BigNotify)

TOKIO WATCH
running single thread with spawned sender
start
avg=515.938µs  <490.603µs [ 492.752µs 495.533µs 545.552µs ] 582.422µs>
running multi thread with spawned sender
start
avg=485.695µs  <296.226µs [ 444.833µs 525.262µs 544.252µs ] 562.281µs>
CUSTOM WATCH
running multi thread with spawned sender
start
avg=236.832µs  <203.757µs [ 225.777µs 238.187µs 249.106µs ] 264.636µs>

old notifier (just Notify)

TOKIO WATCH
running single thread with spawned sender
start
avg=482.403µs  <461.733µs [ 462.903µs 464.243µs 509.432µs ] 513.813µs>
running multi thread with spawned sender
start
avg=383.811µs  <347.524µs [ 368.795µs 383.414µs 397.754µs ] 419.423µs>
CUSTOM WATCH
running multi thread with spawned sender
start
avg=218.534µs  <198.347µs [ 210.887µs 218.927µs 226.486µs ] 237.147µs>

The author of this benchmark mentioned that he used an 8-vcore machine, so I used 8 worker threads as well.

I think that if I can't observe any benefit from BigNotify using the benchmarking method which led to its addition and BigNotify makes the allocation larger, then it might be better to simply remove it. Alternatively, we can think about a heuristic restricting BigNotify to only runtimes with, for example, more than 12 worker threads, although I'm not sure if that's possible. Anyway, the custom implementation from the benchmark still runs much faster on my machine, so I think there is room for improvement when it comes to reducing contention.

@Darksonn
Copy link
Contributor

Darksonn commented Oct 1, 2023

If you think it doesn't make a difference, then go ahead and undo the BigNotify change.

@satakuma
Copy link
Member Author

Sorry for the late response. I ran the benchmarks mentioned above on other machines with significantly weaker CPUs and BigNotify was actually more than 2x faster there, so I am no longer inclined to remove it. I guess its performance depends heavily on the hardware. Unfortunately, I don't have time right now to investigate it further. Let's leave it as is for now.

@Darksonn
Copy link
Contributor

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-benches Area: Benchmarks A-tokio Area: The main tokio crate M-sync Module: tokio/sync
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants