Skip to content

GH-46123: [C++] Undefined behavior in compare_internal.cc and light_array_internal.cc #46124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 18, 2025

Conversation

jonkeane
Copy link
Member

@jonkeane jonkeane commented Apr 12, 2025

Rationale for this change

Removing undefined behaviors, adding a CI job for the M1-mac sanitizer now being run.

What changes are included in this PR?

The fixes + a sanitizer job that confirms they are fixed.

Are these changes tested?

Yup!

Are there any user-facing changes?

Fewer undefined behaviors.

@jonkeane jonkeane requested a review from thisisnic as a code owner April 12, 2025 13:36
Copy link

⚠️ GitHub issue #46123 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes awaiting change review Awaiting change review Component: R labels Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@apache apache deleted a comment from github-actions bot Apr 13, 2025
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 15, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 17, 2025
@jonkeane
Copy link
Member Author

@github-actions crossbow submit test-r-m1-san

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 17, 2025
Copy link

Revision: 07d7582

Submitted crossbow builds: ursacomputing/crossbow @ actions-798e0f8cd9

Task Status
test-r-m1-san GitHub Actions

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thank you @jonkeane for setting up the m1 sanitizer CI job. I think I can take good use of it to further figure out why the misalignment is originated and fixed.

@jonkeane
Copy link
Member Author

@github-actions crossbow submit -g r

Copy link

Revision: 07d7582

Submitted crossbow builds: ursacomputing/crossbow @ actions-bccc2b44c0

Task Status
r-binary-packages GitHub Actions
r-recheck-most GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-clang-asan GitHub Actions
test-r-clang-ubsan GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-extra-packages GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-sanitizer GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-m1-san GitHub Actions
test-r-macos-as-cran GitHub Actions
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-release-latest Azure
test-r-rocker-r-ver-latest Azure
test-r-rstudio-r-base-4.1-opensuse155 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions

@zanmato1984
Copy link
Contributor

OK, I did some experiments that can further justify this fix. This experiment proves that merely changing the pointer type passed into std::memcpy will affect how the compiler generates UB detection stub code which reports arguably false alarm (otherwise we can't explain why changing the pointer type from uint64_t * to uint8_t * (and passing into std::memcpy as void *) eliminates the sanitizer error).

For function (64-bit pointer)

int64_t read_mc64(const uint64_t* src, size_t n) {
    uint64_t result = 0;
    std::memcpy(&result, src, n);
    return result;
}

The compiler generates an alignment checking stub:

        tst     x0, #0x7
        b.ne    .LBB2_4

If the check failed (unaligned to 8-byte), it invokes UBSAN reporting routine __ubsan_handle_type_mismatch_v1 regardless of how it is used (in this function, passed into std::memcpy as a void *, which IMO is NOT UB).

Whereas for function (8-bit pointer)

uint64_t read_mc8(const uint8_t* src, size_t n) {
    uint64_t result = 0;
    std::memcpy(&result, src, n);
    return result;
}

No alignment checking stub so everything is fine.

Hope this could help people to understand better.

Co-authored-by: Rossi Sun <zanmato1984@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 18, 2025
@jonkeane jonkeane merged commit c54b039 into apache:main Apr 18, 2025
2 of 3 checks passed
@jonkeane jonkeane removed the awaiting change review Awaiting change review label Apr 18, 2025
@github-actions github-actions bot added the awaiting changes Awaiting changes label Apr 18, 2025
@jonkeane
Copy link
Member Author

@assignUser any chance we could get this pulled into the 20 release branch? We still have enough runway with CRAN (5/10 is our deadline) that we might not need to do a special R-only release just for this if we do. Or at least, I think it's ok for us to wait and see if 20 makes it out before then.

@assignUser
Copy link
Member

@jonkeane No problem, we need to do another RC anyway 👍

Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit c54b039.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@jonkeane
Copy link
Member Author

There were no benchmark performance regressions. 🎉

I was a little worried this casting was going to hit us, but it looks like no!

@jonkeane jonkeane deleted the 46123_ubsan branch April 19, 2025 15:05
assignUser pushed a commit that referenced this pull request Apr 22, 2025
…t_array_internal.cc` (#46124)

### Rationale for this change

Removing undefined behaviors, adding a CI job for the M1-mac sanitizer now being run.

### What changes are included in this PR?

The fixes + a sanitizer job that confirms they are fixed.

### Are these changes tested?

Yup!

### Are there any user-facing changes?

Fewer undefined behaviors.
* GitHub Issue: #46123

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Rossi Sun <zanmato1984@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants