-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Well that's unfortunate, since the whole point is to be faster than Arc.
Prime suspects are:
- the hash set used to track which roots are locked by the current thread. If so, could mitigate with a faster hash function (stdlib's default is known to be slow), or only supporting one root to be unlocked at a time so that it can just be an
Optioninstead of aHashSet. - thread-local-access itself. I seem to vaguely remember seeing somewhere that stdlib's thread locals may be slower than native ones. If this is the issue we should be able to find or make an alternative that e.g. just uses libc's native thread locals (assuming there's not some fundamental safety issue preventing that)
Probable next step is to run perf to verify where the time is actually being spent.
Here are the current numbers:
$ cargo bench
...
RootedRc::clone 1000 time: [28.890 us 29.105 us 29.332 us]
change: [-1.7329% -1.1690% -0.5940%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
Arc::clone 1000 time: [8.0026 us 8.0184 us 8.0347 us]
change: [-0.6287% -0.2630% +0.1302%] (p = 0.20 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
EDIT: deleted invalid Rc benchmark (operations under test were optimized away)
Metadata
Metadata
Assignees
Labels
No labels