Skip to content

Conversation

@mapleFU
Copy link
Member

@mapleFU mapleFU commented Nov 20, 2024

Rationale for this change

Add boundary check for NumericBuilder::AppendValues for std::vector

Originally, it will :

  1. AppendValues might has std::vector as arguments, std::vector::data might be used
  2. std::vector::data might returns nullptr if size == 0: https://en.cppreference.com/w/cpp/container/vector/data
  3. https://en.cppreference.com/w/cpp/string/byte/memcpy memcpy says, "If either dest or src is an invalid or null pointer, the behavior is undefined, even if count is zero."

What changes are included in this PR?

Add boundary check for NumericBuilder::AppendValues for std::vector

Are these changes tested?

Covered by existing

Are there any user-facing changes?

no

@github-actions
Copy link

⚠️ GitHub issue #44690 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 20, 2024
@mapleFU mapleFU merged commit 501418e into apache:main Nov 20, 2024
39 checks passed
@mapleFU mapleFU removed the awaiting committer review Awaiting committer review label Nov 20, 2024
@mapleFU mapleFU deleted the fix-ub-for-append-vec branch November 20, 2024 15:24
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 501418e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@felipecrv
Copy link
Contributor

I think we take a systematic approach to this and make wrappers to mem* functions that behave sanely on NULL input.

This article [1] discusses the issue in detail and points to approaches taken by projects like boringssl. [2] There is also a proposal from prolific LLVM contributors for fixing this at the C standard level. [3]

[1] https://davidben.net/2024/01/15/empty-slices.html
[2] https://boringssl.googlesource.com/boringssl/+/17cf2cb1d226b0ba2401304242df7ddd3b6f1ff2%5E%21/
[3] https://docs.google.com/document/d/1guH_HgibKrX7t9JfKGfWX2UCPyZOTLsnRfR6UleD1F8/edit?tab=t.0

@pitrou
Copy link
Member

pitrou commented Nov 22, 2024

Well, we do have the MakeNonNull helper which I had forgotten about.

template <typename T>
inline T* MakeNonNull(T* maybe_null = NULLPTR) {
if (ARROW_PREDICT_TRUE(maybe_null != NULLPTR)) {
return maybe_null;
}
return const_cast<T*>(reinterpret_cast<const T*>(&internal::kNonNullFiller));
}

@mapleFU
Copy link
Member Author

mapleFU commented Nov 22, 2024

@felipecrv this would be a better solution, maybe I can move forward!

@felipecrv
Copy link
Contributor

Well, we do have the MakeNonNull helper which I had forgotten about.

I think it's a weird way to go about fixing this. It's easier to accept that memcpy is a bad abstraction when it can't handle 0-length slices for any value of base pointer.

@pitrou
Copy link
Member

pitrou commented Nov 22, 2024

I agree that memcpy is a bad abstraction in that regard. But the main issue here is to remember to call the appropriate helper, regardless of whether it's MakeNonNull or a memcpy wrapper. :)

@felipecrv
Copy link
Contributor

Easy to forbid memcpy and suggest alternative with clang-tidy

bugprone-unsafe-functions.CustomFunctions="
  functionRegex1[, replacement1[, reason1]];
  functionRegex2[, replacement2[, reason2]];
  ...
"

https://clang.llvm.org/extra/clang-tidy/checks/bugprone/unsafe-functions.html

@pitrou
Copy link
Member

pitrou commented Nov 26, 2024

Hmm, perhaps that can be discussed on the dev ML?

@mapleFU
Copy link
Member Author

mapleFU commented Nov 26, 2024

Let me draft a mail later tonight

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants