-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce deserialization allocations/copies #1197
Conversation
Calling data_mut().reserve(additional) used to result in _two_ allocs and memcpys: the first to unshare the underlying vector, and the second upon calling reserve since Arc::make_mut clones so it uses capacity == len. With this fix we manually "unshare" allocating with capacity = len + additional, therefore saving on extra alloc and memcpy.
We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()). With this change we don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.
// Note that we intentionally don't call self.make_data_mut() here. make_data_mut() will | ||
// allocate + memcpy the current data if self.account is shared. We don't need the memcpy | ||
// here tho because account.set_data_from_slice(data) is going to replace the content | ||
// anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice catch finding that self.make_data_mut()
is redundant. Totally makes sense.
// Note that we intentionally don't call self.make_data_mut() here. make_data_mut() will | |
// allocate + memcpy the current data if self.account is shared. We don't need the memcpy | |
// here tho because account.set_data_from_slice(data) is going to replace the content | |
// anyway. | |
// Note that we intentionally don't call self.make_data_mut() here, since we are replacing | |
// the contents transaction wide anyway with account.set_data_from_slice(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +0 on the edit, so I think I'm going to keep my comment. I think it's worth being explicit about make_data_mut() doing a copy, since the method name itself doesn't immediately suggest that.
much like #1192 (comment), i did similar benmarking. i confirmed this improves favorably unified scheduler performance as well yet again. before(at 206a87a) (love bureaucratic work? lol. this result almost same as the result seen in
after(at f180b08)
blockstore-processor sees Now unified scheduler is basically faster twice than the blockstore processor. :) |
This removes an allocation and a copy in AccountSharedData::reserve(). Calling data_mut().reserve(additional) used to result in two allocs and memcpys: the first to unshare the underlying vector, and the second upon calling reserve since Arc::make_mut clones so it uses capacity == len. With this change we now manually "unshare" allocating with capacity = len + additional, therefore saving on extra alloc and memcpy.
Additionally we skip another extra copy in AccountSharedData::set_data_from_slice(). We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the
if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()
). We now don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.These two changes improve deserialization perf.
Before:
After:
On my mnb node this reduces cost of deserialization from 16% to 5%. It improves overall replay perf by 6%, although I expect the saving to be even larger on nodes with a lot of stake which seem to struggle more with memory allocations.