-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/c32 optimizations #3075
Feat/c32 optimizations #3075
Conversation
let norm_str: String = input_str | ||
.to_uppercase() | ||
.replace("O", "0") | ||
.replace("L", "1") | ||
.replace("I", "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reductions of the char scans and allocs here is one of the primary optimizations in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did the normalization code go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This normalization fn was no longer called once the encoding logic was changed to use a faster C32 alphabet lookup using the C32_CHARACTERS_MAP
table. If you take a look at the code there, you'll see where the lowercase and special chars are included. I'm happy to add it back (will be dead code though).
let iter_c32_digits_opts: Vec<Option<usize>> = c32_normalize(input_str) | ||
.chars() | ||
.rev() | ||
.map(|x| C32_CHARACTERS.find(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor of this find
from O(n) to O(1) in the hot path is one of the primary optimizations in this PR
stacks-common/src/address/c32.rs
Outdated
/// table[pair.0.to_ascii_lowercase() as usize] = i; | ||
/// } | ||
/// ``` | ||
const C32_CHARACTERS_MAP: [i8; 128] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented with trying both lazy_static and const fn
for this, neither seemed right, so I took the table result and inlined it. The rust code doc has compilation validation support (at least in the tools I'm using). So seems like this is the best approach. But I'm open to alternatives!
Codecov Report
@@ Coverage Diff @@
## develop #3075 +/- ##
===========================================
+ Coverage 83.72% 83.76% +0.03%
===========================================
Files 259 260 +1
Lines 200777 200932 +155
===========================================
+ Hits 168093 168302 +209
+ Misses 32684 32630 -54
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is all consensus critical code you're touching. We use the bare addresses in the chainstate MARF, which affects the block state root hashes. At a minimum, I need to see unit tests demonstrating that this new code behaves exactly the same way as the old code.
The existing tests cover these paths pretty extensively, e.g. one for the normalization code: I'm happy to add more, but not really seeing any gaps in testing, would appreciate some pointers here. |
A couple things:
|
Sounds reasonable, keeping the old code around would also help with generating benchmarks. Btw, this code is in a library "stacks-common" which is used in projects outside the stacks-node, so I wouldn't be surprised if there was a perf gain that doesn't help with the stacks-node operations you've listed, but could be significant for other use cases. Seems like these would still be good things to upstream (assuming the consensus-critical change requirements you've outlined are met). |
@jcnelson how does this look? Added tests to generate addrs with new and old code, decode both with new and old code, verifying they all match. I ran it with 500,000 iterations of random input, and tests pass (takes a few minutes). I committed a version of the test that performs 5000 iterations. Then added a
|
stacks-common/src/address/c32.rs
Outdated
.map(|ref_x| *ref_x) | ||
.rev() | ||
.filter_map(|x| C32_CHARACTERS_MAP.get(*x as usize)) | ||
.filter_map(|v| u8::try_from(*v).ok()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems to depend on u8::try_from(*v)
returning None
if *v
is -1
. Can this code be more explicit about this? Such as:
.filter_map(|v| {
if *v < 0 {
None
}
else {
// infallible due to the above check
Some(*v as u8)
}
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u8::try_from
is very explicit about conversions: negative inputs will always error, as will any other input outside the range of u8
. x as u8
is a much more convoluted conversion: 1000 as u8
returns 232
(it neither is a saturating conversion, nor a panic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may actually be better to use an array of Option(u8)
rather than -1i8
for the "null" sentinel values (this is what the b58 table does). I'll give that a try when I address the other comments and see if there's any unexpected perf issues with that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the table from [i8; 128]
to [Option<u8>; 128]
which gets rid of this integer conversion, and the benchmark shows a slight perf increase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but please address my comments before merging. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, the only comments I think need to be addressed are the cfg(test)
decoration for the old code, and including the normalization rules in the docstring.
ab381fc
to
5fcdb07
Compare
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Various optimizations in the C32 address decoding paths. Use a decoding table (the inverse of the encoding table/alphabet) in the decoding hot path (previously an O(n) utf8 char scan was used). Reduce some allocs & copies.