-
Notifications
You must be signed in to change notification settings - Fork 157
Use pointer hash rather than str hash when interning strings #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pointer hash rather than str hash when interning strings #969
Conversation
Guarantee that `ArenaStr`s are interned so that hashing can be pointer-based and interning can be more efficient.
The difference this makes is small, admittedly, but since we're already interning for efficiency, may as well make it a little more so. This mainly moves the bulk of |
CI issues seem to be unrelated to the change, as the same thing is happening on another PR. A bunch of
|
Those fingerprint errors are normal; the real problem is in piston-image somehow. My guess is a recent change to fingerprints/depinfo in rustc or cargo is causing this (it's also failing on master builds - https://perf.rust-lang.org/status.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally feel a little uncomfortable with the usage of unsafe in this crate especially considering the performance implications aren't really clear to me. However, that predates this PR so we shouldn't block on that. In the meantime, I'd much prefer if we could thoroughly document the usage of unsafe.
@@ -9,29 +9,21 @@ use std::ptr::NonNull; | |||
use std::sync::Arc; | |||
|
|||
pub trait InternString { | |||
unsafe fn to_interned(s: ArenaStr) -> Self; | |||
unsafe fn to_interned(s: InternedArenaStr) -> Self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this wasn't documented before, but can we add documentation on the safety requirements for this unsafe
function?
= (ArcSwap::new(Arc::new(HashSet::new())), Mutex::new((HashSet::new(), Bump::new()))); | ||
} | ||
|
||
pub fn preloaded<T: InternString>(value: &str) -> Option<T> { | ||
let set = INTERNED.0.load(); | ||
unsafe { Some(T::to_interned(*set.get(value)?)) } | ||
InternedArenaStr::preloaded(value).map(|s| unsafe { T::to_interned(s) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, it would be great to document the use of unsafe
and why this particular usage is safe. We should make sure we document all usages of unsafe thoroughly.
@Mark-Simulacrum @tgnottingham Should we file an issue in rust-lang/rust for the fingerprinting issue? |
Yeah, that seems good -- worth additional investigation. |
I just noticed that this change is incorrect due to I think it was also incorrect before this change, because I'll close this and address the above issue in another PR if necessary. I'll also look at documenting the |
Yeah I think I agree with @rylev that we should see about dropping the interning -- it was added back when we were still loading all of our data on each start up, and I'd guess is significantly much less necessary/effective today. I think some quick experiments to check whether it's necessary would be good. |
Opened #974 for reevaluating interning. |
Guarantee that
ArenaStr
s are interned so that hashing can bepointer-based and interning can be more efficient.
Rename
ArenaStr
toInternedArenaStr
.