-
-
Notifications
You must be signed in to change notification settings - Fork 104
fix: ubsan signed overflow violations #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In these instances, the uint8_t inputs multiplied with the literal, which is a signed long, will make a signed long. For certain inputs, this will overflow, despite the return type being unsigned. Fix by making it always unsigned in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution. I'm leaving this unmerged for @lemire to review as well.
@@ -2632,7 +2632,9 @@ uint32_t find_range_index(uint32_t key) { | |||
} | |||
|
|||
bool ascii_has_upper_case(char* input, size_t length) { | |||
auto broadcast = [](uint8_t v) -> uint64_t { return 0x101010101010101 * v; }; | |||
auto broadcast = [](uint8_t v) -> uint64_t { | |||
return 0x101010101010101ull * v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update ada/idna
repository with the relevant changes as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i can do that once we've finished this one
@q66 Can you provide a minimum reproduction so we can add it to our testing CI and not regress in the future? |
you can reproduce this with something like:
compiled with
It will generally fail for any value that does not fit within basic 7-bit ASCII (i.e. 128 and up). I'm not sure what the best strategy for integration into test suite would be. |
(note that the functions that take a |
See #348 |
In these instances, the uint8_t inputs multiplied with the literal, which is a signed long, will make a signed long. For certain inputs, this will overflow, despite the return type being unsigned. Fix by making it always unsigned in the first place.
I found this because a user in my distribution (which builds all prod packages with a subset of ubsan by default) reported crashes newly introduced in newest nodejs, which apparently brings ada as a dependency. While the branch used by nodejs appears to be different, just putting this here anyway.