Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39577: [C++] Fix tail-word access cross buffer boundary in CompareBinaryColumnToRow #39606

Merged
merged 4 commits into from
Jan 16, 2024

Conversation

zanmato1984
Copy link
Collaborator

@zanmato1984 zanmato1984 commented Jan 15, 2024

Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in KeyCompare::CompareBinaryColumnToRow. Comment #39577 (comment) is a concrete example.

What changes are included in this PR?

Make KeyCompare::CompareBinaryColumnToRow tail-word safe.

Are these changes tested?

UT included.

Are there any user-facing changes?

No.

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@zanmato1984 zanmato1984 changed the title [C++] Fix tail-word access cross buffer boundary in CompareBinaryColumnToRow GH-39577: [C++] Fix tail-word access cross buffer boundary in CompareBinaryColumnToRow Jan 15, 2024
@zanmato1984 zanmato1984 marked this pull request as ready for review January 15, 2024 17:18
@@ -19,3 +19,5 @@
# in a row-major order.

arrow_install_all_headers("arrow/compute/row")

add_arrow_compute_test(compare_test SOURCES compare_test.cc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this to internals_test instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done.

uint64_t key_left = util::SafeLoad(key_left_ptr + i);
uint64_t key_right = key_right_ptr[i];
uint64_t key_left = 0;
memcpy(&key_left, key_left_ptr + i, length - num_loops_less_one * 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length - num_loops_less_one * 8 is used several times in this closure, can you factor it out and give it a meaningful name (perhaps num_tail_bytes)?

But, actually, isn't length - num_loops_less_one * 8 simply equal to length?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length - num_loops_less_one * 8 is used several times in this closure, can you factor it out and give it a meaningful name (perhaps num_tail_bytes)?

Yeah, that's reasonable. Done.

But, actually, isn't length - num_loops_less_one * 8 simply equal to length?

Sorry I don't quite understand how this is coming. For instance, for fsb(19) type, length will be 19 and length - num_loops_less_one * 8 (aka. num_tail_bytes) will be 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, thank you.

uint64_t key_left = 0;
memcpy(&key_left, key_left_ptr + i, length - num_loops_less_one * 8);
uint64_t key_right = 0;
memcpy(&key_right, key_right_ptr + i, length - num_loops_less_one * 8);
result_or |= tail_mask & (key_left ^ key_right);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is tail_mask still useful here? We're extracting exactly the desired number of bytes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Removed tail_mask.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 16, 2024
@pitrou
Copy link
Member

pitrou commented Jan 16, 2024

Thanks a lot for finding this out @zanmato1984 ! I posted some comments above.

@zanmato1984
Copy link
Collaborator Author

Thanks a lot for finding this out @zanmato1984 ! I posted some comments above.

Thank you for looking and the very helpful comments @pitrou ! Change updated.

@pitrou
Copy link
Member

pitrou commented Jan 16, 2024

@github-actions crossbow submit -g cpp

Copy link

Revision: f8cce17

Submitted crossbow builds: ursacomputing/crossbow @ actions-71248b8771

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp GitHub Actions
test-debian-11-cpp-amd64 GitHub Actions
test-debian-11-cpp-i386 GitHub Actions
test-fedora-38-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions

@pitrou pitrou merged commit cd3321b into apache:main Jan 16, 2024
37 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jan 16, 2024
@pitrou
Copy link
Member

pitrou commented Jan 16, 2024

@raulcd I think this is a good candidate for 15.0.0.

@pitrou pitrou modified the milestone: 15.0.0 Jan 16, 2024
@raulcd
Copy link
Member

raulcd commented Jan 16, 2024

@pitrou is this worth a new Release Candidate? I created RC1 a couple of hours ago and have already generated all binaries (just waiting for one job to finish) for it successfully, see: #39641

@pitrou
Copy link
Member

pitrou commented Jan 16, 2024

@raulcd Not in itself. But if you have to craft a RC2 for other reasons, this would be nice to include.

Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit cd3321b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

idailylife pushed a commit to idailylife/arrow that referenced this pull request Jan 18, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
raulcd pushed a commit that referenced this pull request Feb 20, 2024
…eBinaryColumnToRow` (#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment #39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: #39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
zanmato1984 added a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Mar 8, 2024
…CompareBinaryColumnToRow` (apache#39606)

### Rationale for this change

Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in  `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example.

### What changes are included in this PR?

Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39577

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][Acero] ASAN reports heap buffer overflow in arrow::compute::KeyCompare::CompareBinaryColumnToRow
3 participants