-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39577: [C++] Fix tail-word access cross buffer boundary in CompareBinaryColumnToRow
#39606
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
CompareBinaryColumnToRow
CompareBinaryColumnToRow
@@ -19,3 +19,5 @@ | |||
# in a row-major order. | |||
|
|||
arrow_install_all_headers("arrow/compute/row") | |||
|
|||
add_arrow_compute_test(compare_test SOURCES compare_test.cc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this to internals_test
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Done.
uint64_t key_left = util::SafeLoad(key_left_ptr + i); | ||
uint64_t key_right = key_right_ptr[i]; | ||
uint64_t key_left = 0; | ||
memcpy(&key_left, key_left_ptr + i, length - num_loops_less_one * 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length - num_loops_less_one * 8
is used several times in this closure, can you factor it out and give it a meaningful name (perhaps num_tail_bytes
)?
But, actually, isn't length - num_loops_less_one * 8
simply equal to length
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length - num_loops_less_one * 8
is used several times in this closure, can you factor it out and give it a meaningful name (perhapsnum_tail_bytes
)?
Yeah, that's reasonable. Done.
But, actually, isn't
length - num_loops_less_one * 8
simply equal tolength
?
Sorry I don't quite understand how this is coming. For instance, for fsb(19)
type, length
will be 19
and length - num_loops_less_one * 8
(aka. num_tail_bytes
) will be 3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, thank you.
uint64_t key_left = 0; | ||
memcpy(&key_left, key_left_ptr + i, length - num_loops_less_one * 8); | ||
uint64_t key_right = 0; | ||
memcpy(&key_right, key_right_ptr + i, length - num_loops_less_one * 8); | ||
result_or |= tail_mask & (key_left ^ key_right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is tail_mask
still useful here? We're extracting exactly the desired number of bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Removed tail_mask
.
Thanks a lot for finding this out @zanmato1984 ! I posted some comments above. |
Thank you for looking and the very helpful comments @pitrou ! Change updated. |
@github-actions crossbow submit -g cpp |
Revision: f8cce17 Submitted crossbow builds: ursacomputing/crossbow @ actions-71248b8771 |
@raulcd I think this is a good candidate for 15.0.0. |
@raulcd Not in itself. But if you have to craft a RC2 for other reasons, this would be nice to include. |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit cd3321b. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…CompareBinaryColumnToRow` (apache#39606) ### Rationale for this change Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example. ### What changes are included in this PR? Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe. ### Are these changes tested? UT included. ### Are there any user-facing changes? No. * Closes: apache#39577 Authored-by: zanmato1984 <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…CompareBinaryColumnToRow` (apache#39606) ### Rationale for this change Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example. ### What changes are included in this PR? Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe. ### Are these changes tested? UT included. ### Are there any user-facing changes? No. * Closes: apache#39577 Authored-by: zanmato1984 <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…CompareBinaryColumnToRow` (apache#39606) ### Rationale for this change Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example. ### What changes are included in this PR? Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe. ### Are these changes tested? UT included. ### Are there any user-facing changes? No. * Closes: apache#39577 Authored-by: zanmato1984 <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…eBinaryColumnToRow` (#39606) ### Rationale for this change Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in `KeyCompare::CompareBinaryColumnToRow`. Comment #39577 (comment) is a concrete example. ### What changes are included in this PR? Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe. ### Are these changes tested? UT included. ### Are there any user-facing changes? No. * Closes: #39577 Authored-by: zanmato1984 <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…CompareBinaryColumnToRow` (apache#39606) ### Rationale for this change Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example. ### What changes are included in this PR? Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe. ### Are these changes tested? UT included. ### Are there any user-facing changes? No. * Closes: apache#39577 Authored-by: zanmato1984 <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…CompareBinaryColumnToRow` (apache#39606) ### Rationale for this change Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in `KeyCompare::CompareBinaryColumnToRow`. Comment apache#39577 (comment) is a concrete example. ### What changes are included in this PR? Make `KeyCompare::CompareBinaryColumnToRow` tail-word safe. ### Are these changes tested? UT included. ### Are there any user-facing changes? No. * Closes: apache#39577 Authored-by: zanmato1984 <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
Default buffer alignment (64b) doesn't guarantee the safety of tail-word access in
KeyCompare::CompareBinaryColumnToRow
. Comment #39577 (comment) is a concrete example.What changes are included in this PR?
Make
KeyCompare::CompareBinaryColumnToRow
tail-word safe.Are these changes tested?
UT included.
Are there any user-facing changes?
No.
arrow::compute::KeyCompare::CompareBinaryColumnToRow
#39577