GH-43495: [C++][Compute] Widen the row offset of the row table to 64-bit #43389

zanmato1984 · 2024-07-23T11:59:50Z

Rationale for this change

The row table uses uint32_t as the row offset within the row data buffer, effectively limiting the row data from growing beyond 4GB. This is quite restrictive, and the impact is described in more detail in #43495. This PR proposes to widen the row offset from 32-bit to 64-bit to address this limitation.

Benefits

Currently, the row table has three major limitations:

The overall data size cannot exceed 4GB.
The size of a single row cannot exceed 4GB.
The number of rows cannot exceed 2^32.

This enhancement will eliminate the first limitation. Meanwhile, the second and third limitations are less likely to occur. Thus, this change will enable a significant range of use cases that are currently unsupported.

Overhead

Of course, this will introduce some overhead:

An extra 4 bytes of memory consumption for each row due to the offset size difference from 32-bit to 64-bit.
A wider offset type requires a few more SIMD instructions in each 8-row processing iteration.

In my opinion, this overhead is justified by the benefits listed above.

What changes are included in this PR?

Change the row offset of the row table from 32-bit to 64-bit. Relative code in row comparison/encoding and swiss join has been updated accordingly.

Are these changes tested?

Test included.

Are there any user-facing changes?

Users could potentially see higher memory consumption when using acero's hash join and hash aggregation. However, on the other hand, certain use cases used to fail are now able to complete.

GitHub Issue: [C++][Compute] Consider widening the row offset of the row table to 64-bit #43495

github-actions · 2024-07-23T12:00:14Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

zanmato1984 · 2024-07-24T17:41:23Z

cpp/src/arrow/acero/swiss_join_avx2.cc

Just found that the functions in this file are not used anywhere. I updated it still though - maybe I'll clean it up or make it useful in the future.

Hmm, do you mean the AVX2 speedups are actually not wired anywhere?

Right, they are not as far as I searched the code.

zanmato1984 · 2024-07-29T07:52:33Z

@ursabot please benchmark

ursabot · 2024-07-29T07:52:38Z

Benchmark runs are scheduled for commit a789e04e964dacf02e76df3cfd48a59024843fce. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

conbench-apache-arrow · 2024-07-29T15:07:45Z

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit a789e04e964dacf02e76df3cfd48a59024843fce.

There were 5 benchmark results indicating a performance regression:

Pull Request Run on ec2-c6a-4xlarge-us-east-2 at 2024-07-29 08:57:52Z
- BenchmarkAssumeTimezone (C++) with params=524288/100, source=cpp-micro, suite=arrow-compute-scalar-temporal-benchmark
- BenchmarkTemporalBinary (C++) with params=<QuartersBetween, zoned>/524288/0, source=cpp-micro, suite=arrow-compute-scalar-temporal-benchmark
and 3 more (see the report linked below)

The full Conbench report has more details.

zanmato1984 · 2024-07-31T09:44:39Z

The ASAN failure is because #43414. Others seem unrelated.

zanmato1984 · 2024-07-31T09:51:44Z

cpp/src/arrow/compute/row/compare_test.cc

Note this assertion will fire for 32-bit offset, because there are unreported offset overflow issues in fix-length code path as described in #43495, quoted below:

Even for the fixed-length code path, which doesn’t deal with the offset buffer at all and thus is supposed to be less problematic, there are obvious offset overflow issues like [1] and [2] (these issues are currently unreported but observed in my local experiments).
[1]

arrow/cpp/src/arrow/compute/row/compare_internal.cc

Line 108 in 187197c

uint32_t offset_right = irow_right * fixed_length + offset_within_row;

[2]

arrow/cpp/src/arrow/compute/row/compare_internal_avx2.cc

Lines 243 to 244 in 187197c

__m256i offset_right =

_mm256_mullo_epi32(irow_right, _mm256_set1_epi32(fixed_length));

github-actions · 2024-07-31T10:04:58Z

⚠️ GitHub issue #43495 has been automatically assigned in GitHub to PR creator.

cpp/src/arrow/acero/swiss_join.cc

pitrou · 2024-08-08T08:13:30Z

cpp/src/arrow/acero/swiss_join_avx2.cc

Hmm, do you mean the AVX2 speedups are actually not wired anywhere?

cpp/src/arrow/compute/row/compare_internal_avx2.cc

cpp/src/arrow/compute/row/row_test.cc

cpp/src/arrow/testing/random.h

cpp/src/arrow/acero/swiss_join_avx2.cc

pitrou · 2024-08-15T13:46:49Z

cpp/src/arrow/compute/row/compare_test.cc

+                                     row_ids_out.data(), columns, row_table,
+                                     /*are_cols_in_encoding_order=*/true,
+                                     /*out_match_bitvector_maybe_null=*/NULLPTR);
+    ASSERT_EQ(num_rows_no_match, 0);


So, this means that the rows and columns compared equal, right? I'm trying to understand the API...

Right. It returns the rows not equal to columns.

pitrou · 2024-08-15T13:48:27Z

cpp/src/arrow/compute/row/compare_test.cc

+        /*out_num_rows=*/NULLPTR, /*out_sel_left_maybe_same=*/NULLPTR, columns, row_table,
+        /*are_cols_in_encoding_order=*/true, match_bitvector.data());
+    ASSERT_EQ(CountSetBits(match_bitvector.data(), 0, num_rows_to_compare),
+              num_rows_to_compare);


Hmm, the docstring says that CompareColumnsToRows returns "a single 16-bit selection vector of rows that failed comparison". If rows are equal to columns, then surely all bits should be 0, not 1? Am I misunderstanding something?

The doc string is misleading. It actually returns the result in either of two ways: selection vector of the non-match row ids (as the doc string says), and the match bit vector (1 for match and 0 otherwise, this is not mentioned in the doc string), depending on if the output parameter of row ids is not null. And this code is checking using the second way.

I can update the doc string to make it more clear.

What a weird API. Yes, updating the doc string would be nice :-)

pitrou · 2024-08-15T13:51:19Z

CI failures look unrelated, FTR.

zanmato1984 · 2024-08-16T10:35:23Z

@github-actions crossbow submit -g cpp

github-actions · 2024-08-16T10:38:00Z

Revision: ded3e67

Submitted crossbow builds: ursacomputing/crossbow @ actions-c44c9b7d7c

Task	Status
test-alpine-linux-cpp
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-cuda-cpp
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-fedora-39-cpp
test-ubuntu-20.04-cpp
test-ubuntu-20.04-cpp-bundled
test-ubuntu-20.04-cpp-minimal-with-formats
test-ubuntu-20.04-cpp-thread-sanitizer
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-emscripten
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-gcc-13-bundled
test-ubuntu-24.04-cpp-gcc-14

felipecrv

LGTM.

vkhodygo · 2024-08-19T12:32:02Z

Fab, a lot of people will be happy with this one.

A few questions here:

there was a lengthy discussion (and a document) about larger than memory datasets GH-31769: [C++][Acero] Add spilling for hash join #13669 [C++] Support hash-join on larger than memory datasets #31769, will there be any progress in this direction? Now that the limit is gone I expect an influx of reports about crashed code solely because of the dataset size.
An extra 4 bytes of memory consumption for each row due to the offset size difference from 32-bit to 64-bit.
A wider offset type requires a few more SIMD instructions in each 8-row processing iteration.

Do you think it's possible to add some heuristics and/or an explicit key to keep the old behaviour for reasonably small datasets?

zanmato1984 · 2024-08-19T12:56:44Z

Thanks for the feedback!

there was a lengthy discussion (and a document) about larger than memory datasets GH-31769: [C++][Acero] Add spilling for hash join #13669 [C++] Support hash-join on larger than memory datasets #31769, will there be any progress in this direction? Now that the limit is gone I expect an influx of reports about crashed code solely because of the dataset size.

There is unfortunately no update nor plan about it AFAIK. By "crashed code solely because of the data size" if you mean the oom-kill of the operating system, then yes :)

An extra 4 bytes of memory consumption for each row due to the offset size difference from 32-bit to 64-bit.
A wider offset type requires a few more SIMD instructions in each 8-row processing iteration.

Do you think it's possible to add some heuristics and/or an explicit key to keep the old behaviour for reasonably small datasets?

Great question. I've thought about that too and had a short discussion with other developers in the community meeting. My reasons for not doing this are: 1) code complexity: there are lots of code that assume the offset type to be concrete, esp. in SIMD specialized functions, generalizing them is not trivial; 2) runtime overhead: since the data size is very likely unknown before the computation, the bottom line would be to use the 32-bit offset until certain limit (4GB) has reached as the data accumulates, by when we switch to 64-bit offset - a huge memory copy from the 32-bit offset buffer to the 64-bit offset buffer would happen. Thus the decision.

conbench-apache-arrow · 2024-08-19T18:23:16Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 5e68513.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 40 possible false positives for unstable benchmarks that are known to sometimes produce them.

kou · 2024-09-13T05:06:28Z

@github-actions crossbow submit test-r-rstudio-r-base-4.1-opensuse155

github-actions · 2024-09-13T05:08:45Z

Revision: ded3e67

Submitted crossbow builds: ursacomputing/crossbow @ actions-947c28ffed

Task	Status
test-r-rstudio-r-base-4.1-opensuse155

### Rationale for this change A piece of comment is wrongly introduced in #43389. ### What changes are included in this PR? ### Are these changes tested? ### Are there any user-facing changes? Authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Rossi Sun <zanmato1984@gmail.com>

…alculation for fixed length and null masks (#45336) ### Rationale for this change #45334 ### What changes are included in this PR? 1. An all-mighty test case that can effectively reveal all the bugs mentioned in the issue; 2. Other than directly fixing the bugs (actually simply casting to 64-bit somewhere in the multiplication will do), I did some refinement to the buffer accessors of the row table, in order to eliminate more potential similar issues (which I believe do exist): 1. `null_masks()` -> `null_masks(row_id)` which does overflow-safe indexing inside; 2. `is_null(row_id, col_pos)` which does overflow-safe indexing and directly gets the bit of the column; 3. `data(1)` -> `fixed_length_rows(row_id)` which first asserts the row table being fixed-length, then does overflow-safe indexing inside; 4. `data(2)` -> `var_length_rows()` which only asserts the row table being var-length. It is supposed to be paired by the `offsets()` (which is already 64-bit by #43389 ); 5. The `data(0/1/2)` members are made private. 3. The AVX2 specializations are fixed individually by using 64-bit multiplication and indexing. ### Are these changes tested? Yes. ### Are there any user-facing changes? None. * GitHub Issue: #45334 Authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Rossi Sun <zanmato1984@gmail.com>

github-actions bot added Component: C++ awaiting review Awaiting review labels Jul 23, 2024

zanmato1984 commented Jul 24, 2024

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jul 24, 2024

zanmato1984 changed the title ~~EXPERIMENT: [C++] Widen the row offset of row table to 64-bit~~ EXPERIMENT: [C++][Compute] Widen the row offset of row table to 64-bit Jul 24, 2024

zanmato1984 changed the title ~~EXPERIMENT: [C++][Compute] Widen the row offset of row table to 64-bit~~ GH-43495: [C++][Compute] Widen the row offset of row table to 64-bit Jul 31, 2024

zanmato1984 changed the title ~~GH-43495: [C++][Compute] Widen the row offset of row table to 64-bit~~ GH-43495: [C++][Compute] Widen the row offset of the row table to 64-bit Jul 31, 2024

zanmato1984 marked this pull request as ready for review July 31, 2024 09:43

zanmato1984 requested a review from westonpace as a code owner July 31, 2024 09:43

zanmato1984 commented Jul 31, 2024

View reviewed changes

felipecrv self-requested a review July 31, 2024 17:10

zanmato1984 force-pushed the row-table-large-offset branch from 0e3a361 to 2da4b07 Compare August 3, 2024 08:49

pitrou reviewed Aug 8, 2024

View reviewed changes

zanmato1984 commented Aug 9, 2024

View reviewed changes

cpp/src/arrow/acero/swiss_join_avx2.cc Outdated Show resolved Hide resolved

zanmato1984 added 9 commits August 15, 2024 00:31

Widen the offset of row table to 64b

937ff8b

More fix

ab93a63

More fix

e50e266

Fix test name

c7e5225

Remove header

02669e1

Support avx2

148118a

Fix warning

65aa81e

Fix unaligned store

2c4d57f

Fix warning

db12592

pitrou reviewed Aug 15, 2024

View reviewed changes

zanmato1984 added 2 commits August 16, 2024 18:14

Update doc string for compare columns to rows api

0082bca

Static assert the size of offset type in row table at where assumes it

ded3e67

felipecrv reviewed Aug 16, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Aug 16, 2024

pitrou approved these changes Aug 19, 2024

View reviewed changes

pitrou merged commit 5e68513 into apache:main Aug 19, 2024

pitrou removed the awaiting changes Awaiting changes label Aug 19, 2024

pitrou mentioned this pull request Aug 19, 2024

[C++][Compute] Consider widening the row offset of the row table to 64-bit #43495

Closed

This was referenced Aug 19, 2024

[C++] Acero cannot join large tables because of a misspecified test #37655

Closed

GH-37655: [C++] Allow joins of large tables in Acero #37709

Closed

github-actions bot added the awaiting committer review Awaiting committer review label Aug 19, 2024

kou mentioned this pull request Sep 13, 2024

[C++][Compute] _mm256_set_m128i() is unavailable on OpenSUSE 15.1 (GCC 7.5) #44098

Closed

zanmato1984 mentioned this pull request Sep 25, 2024

[C++] data corruption when using group_by and aggregate on large data sets #36295

Closed

zanmato1984 mentioned this pull request Dec 25, 2024

MINOR: [C++] Remove wrong comment in swiss join avx2 #45107

Merged

zanmato1984 mentioned this pull request Jan 14, 2025

[C++][Acero] Potential truncation when merging row arrays #45254

Closed

zanmato1984 mentioned this pull request Jan 23, 2025

GH-45334: [C++][Acero] Fix swiss join overflow issues in row offset calculation for fixed length and null masks #45336

Merged

	__m256i offset_right =
	_mm256_mullo_epi32(irow_right, _mm256_set1_epi32(fixed_length));

Uh oh!

GH-43495: [C++][Compute] Widen the row offset of the row table to 64-bit #43389

GH-43495: [C++][Compute] Widen the row offset of the row table to 64-bit #43389

Conversation

zanmato1984 commented Jul 23, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

Benefits

Overhead

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Jul 23, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zanmato1984 commented Jul 29, 2024

Uh oh!

ursabot commented Jul 29, 2024

Uh oh!

conbench-apache-arrow bot commented Jul 29, 2024

Uh oh!

zanmato1984 commented Jul 31, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 31, 2024

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pitrou Aug 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou commented Aug 15, 2024

Uh oh!

zanmato1984 commented Aug 16, 2024

Uh oh!

github-actions bot commented Aug 16, 2024

Uh oh!

felipecrv left a comment

Choose a reason for hiding this comment

Uh oh!

vkhodygo commented Aug 19, 2024

Uh oh!

zanmato1984 commented Aug 19, 2024

Uh oh!

conbench-apache-arrow bot commented Aug 19, 2024

Uh oh!

kou commented Sep 13, 2024

Uh oh!

github-actions bot commented Sep 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

zanmato1984 commented Jul 23, 2024 •

edited by github-actions bot

Loading

pitrou Aug 15, 2024 •

edited

Loading