Skip to content

fix: ubsan signed overflow violations #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 21, 2023
Merged

Conversation

q66
Copy link
Contributor

@q66 q66 commented Apr 21, 2023

In these instances, the uint8_t inputs multiplied with the literal, which is a signed long, will make a signed long. For certain inputs, this will overflow, despite the return type being unsigned. Fix by making it always unsigned in the first place.

I found this because a user in my distribution (which builds all prod packages with a subset of ubsan by default) reported crashes newly introduced in newest nodejs, which apparently brings ada as a dependency. While the branch used by nodejs appears to be different, just putting this here anyway.

In these instances, the uint8_t inputs multiplied with the literal,
which is a signed long, will make a signed long. For certain
inputs, this will overflow, despite the return type being unsigned.
Fix by making it always unsigned in the first place.
@anonrig anonrig requested a review from lemire April 21, 2023 15:12
Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. I'm leaving this unmerged for @lemire to review as well.

@@ -2632,7 +2632,9 @@ uint32_t find_range_index(uint32_t key) {
}

bool ascii_has_upper_case(char* input, size_t length) {
auto broadcast = [](uint8_t v) -> uint64_t { return 0x101010101010101 * v; };
auto broadcast = [](uint8_t v) -> uint64_t {
return 0x101010101010101ull * v;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update ada/idna repository with the relevant changes as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i can do that once we've finished this one

@anonrig
Copy link
Member

anonrig commented Apr 21, 2023

@q66 Can you provide a minimum reproduction so we can add it to our testing CI and not regress in the future?

@q66
Copy link
Contributor Author

q66 commented Apr 21, 2023

you can reproduce this with something like:

#include <cstdio>

int main() {
    auto broadcast = [](unsigned char v) -> unsigned long long {
        return 0x101010101010101 * v;
    };
    printf("%llu\n", broadcast(0xFF));
}

compiled with -fsanitize=undefined:

$ clang++ test.cc -o test -fsanitize=undefined
$ ./test
test.cc:5:34: runtime error: signed integer overflow: 72340172838076673 * 255 cannot be represented in type 'long'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test.cc:5:34 in 
18446744073709551615

It will generally fail for any value that does not fit within basic 7-bit ASCII (i.e. 128 and up).

I'm not sure what the best strategy for integration into test suite would be.

@q66
Copy link
Contributor Author

q66 commented Apr 21, 2023

(note that the functions that take a uint64_t should be unaffected, as no promotion takes place; with uint8_t the multiplication with a signed literal will yield a signed value again, but that is not the case if you already have a uint64_t operand on one side of the expression)

@lemire
Copy link
Member

lemire commented Apr 21, 2023

See #348

@anonrig anonrig merged commit 38d6eae into ada-url:main Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants