-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slightly speed up the resize_inner
function + documentation of other functions.
#451
Conversation
/// created will be yielded by that iterator. | ||
/// - The order in which the iterator yields indices of the buckets is unspecified | ||
/// and may change in the future. | ||
pub(crate) struct FullBucketsIndices { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of this over RawIter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't use RawIter since it's a generic struct over T
. The changes are made in the method of the RawTableInner struct, which has no information about T
type.
At the same time, the use of BitMaskIter
inside FullBucketsIndices
gives the same acceleration of the iteration of elements as for RawIter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could RawIter
be built on top of this to avoid code duplication? Assuming that doesn't impact compilation times too much due to the additional layer of inlining needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think this would improve compile times since the code for iteration would only need to be instantiated once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that FullBucketsIndices
could be changed to take a size field and thus maintain a pointer to the bucket. There could be some performance impact due to the use of indices that RawIter
could be sensitive too.
I'd try these changes on top of this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try 😄. It will be necessary to slightly change the structure of FullBucketsIndices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that reflect_toggle_full
method of RawIter
. It causes the FullBucketsIndices
size to bloat. Okay, I'll go to bed, maybe tomorrow I'll come up with something adequate 🤔
self.alloc.clone(), | ||
table_layout, | ||
capacity, | ||
fallibility, | ||
)?; | ||
new_table.growth_left -= self.items; | ||
new_table.items = self.items; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not needed here because:
- This allows you to make the function safe.
- Adds more consistency in functions, allowing less overthinking and remembering that you changed the number of elements before actually adding them, which is not a good idea in my opinion and only confuses. For example, in the
clone_from_impl
method, we first add items, and only then changeself.table.items
andself.table.growth_left
fields.
I tested this PR out with rustc and it does seem to be a performance improvement there:
|
#[allow(clippy::mut_mut)] | ||
#[inline] | ||
unsafe fn prepare_resize( | ||
fn prepare_resize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that a lot of these methods that mutate data on the heap should probably take &mut self
, otherwise there is an additional safety requirement that no other thread is concurrently accessing the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to go through all the functions (after the PR), although it is probably won’t be able to do this with all of them.
Specifically, this function can be left as it is (with &self
), since it does not change the original table (neither before nor after this pull request).
/// created will be yielded by that iterator. | ||
/// - The order in which the iterator yields indices of the buckets is unspecified | ||
/// and may change in the future. | ||
pub(crate) struct FullBucketsIndices { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could RawIter
be built on top of this to avoid code duplication? Assuming that doesn't impact compilation times too much due to the additional layer of inlining needed.
@Amanieu I implemented @Zoxc Could you please test this PR with rustc again? Upd: Squashed all changes into two commits. |
5d4bbee
to
1d83af3
Compare
Testing just the
|
1d83af3
to
3e7e253
Compare
I think I found the cause of the regression. Can you please repeat? |
Still a regression:
|
@Amanieu I give up 😄. As @Zoxc predicted, and then confirmed by benchmarking, the implementation of let data_index = group_first_index + index;
bucket.next_n(data_index) That is, each time there is an additional To reduce the code itself, we can, of course, perform some sort of dispatching through traits and generics, like the example below, but this will likely slow down compilation and, in essence, will be equivalent to two different structures: trait GroupBase {
fn next_n(&self, offset: usize) -> Self;
}
impl GroupBase for usize {
fn next_n(&self, offset: usize) -> Self {
self + offset
}
}
impl<T> GroupBase for Bucket<T> {
fn next_n(&self, offset: usize) -> Self {
unsafe { self.next_n(offset) }
}
}
pub(crate) struct FullBucketsIndices<B: GroupBase> {
current_group: BitMaskIter,
group_base: B,
// Pointer to the current group of control bytes,
ctrl: NonNull<u8>,
// Buckets number in given subset of a table.
buckets: usize,
// Number of elements in given subset of a table.
items: usize,
} |
That's fine, let's just switch back to the previous implementation of |
3e7e253
to
7e45a78
Compare
Returned the previous implementation. However, the tests are failing, it looks like an upstrem problem rust-lang/rust#115239 |
LGTM. Let's wait until CI is resolved before merging. |
@bors r+ |
☀️ Test successful - checks-actions |
Change `&` to `&mut` where applicable This addresses #451 (comment). All remaining functions either return raw pointers or do nothing with the data on the heap.
This speeds up the
resize_inner
function a bit, since now reading the data from the heap is done not by byte, but by a group. In addition, we may not iterate over all control bytes if we have yielded all indexes of full buckets. For example, on my computer:Before (with
cargo bench
):After (with
cargo bench
):As for the documentation, I started with
resize_inner
and since this function depends on others, I had to document them as well, etc., so there a lot of documentation in total. Also fixed a couple of inaccuracies with marking the function as unsafe.Fix #453.