Skip to content

[C++][Acero] Potential truncation when merging row arrays #45254

@zanmato1984

Description

@zanmato1984

Describe the bug, including details regarding any error messages, version, and platform.

In #43389 I widened the offset within the row table to 64-bit and changed the references of row offset, but I seem to have missed two places:

target->rows_.mutable_offsets()[num_rows] = static_cast<uint32_t>(num_bytes);

and
target->rows_.mutable_offsets()[num_rows] = static_cast<uint32_t>(num_bytes);

(The num_bytes is the accumulation of the sizes of each source row table and the static_cast here is apparently truncating the number which is possibly bigger than 4GB).

Unfortunately our existing test

TEST(HashJoin, LARGE_MEMORY_TEST(BuildSideOver4GBVarLength)) {
didn't catch this. Because the exposure of this bug requires the matching row to be located in the area over 4GB, which depends on the hash algorithm, which is opaque. We can confirm the truncation actually happens by adding DCHECK_LE(num_bytes, uint32_max).

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions