Skip to content

Commit

Permalink
Fix bug in initializing global aggregations with initialized flag (fa…
Browse files Browse the repository at this point in the history
…cebookincubator#9396)

Summary:
Pull Request resolved: facebookincubator#9396

facebookincubator#9067 introduced an initialized flag that we include alongside the null flag
for aggregations in the rows in RowContainer.

Global aggregations do not use a RowContainer to create rows, rather a single row is constructed manually inside
GroupingSets.  When that PR was landed this logic was not properly updated, we did not allocate enough space in the row
for the additional bits which could lead to null/initialized flags overwriting the aggregate values when there were a lot of
them.

This change fixes it so that we allocate the correct amount of space for flags for global aggregations as an immediate fix.

Longer term it would be better to centralize the logic for creating rows in RowContainer (I'd like to get us back to a working
state before attempting such a refactor).

Reviewed By: Yuhta

Differential Revision: D55823098

fbshipit-source-id: 9a15f257669273c64dc0361bf735c4472e45957d
  • Loading branch information
Kevin Wilfong authored and Joe-Abraham committed Jun 7, 2024
1 parent 82a28f6 commit da15ba5
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 2 deletions.
8 changes: 6 additions & 2 deletions velox/exec/GroupingSet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -413,14 +413,18 @@ void GroupingSet::initializeGlobalAggregation() {
lookup_->reset(1);

// Row layout is:
// - null flags - one bit per aggregate,
// - alternating null flag, intialized flag - one bit per flag, one pair per
// aggregation,
// - uint32_t row size,
// - fixed-width accumulators - one per aggregate
//
// Here we always make space for a row size since we only have one row and no
// RowContainer. The whole row is allocated to guarantee that alignment
// requirements of all aggregate functions are satisfied.
int32_t rowSizeOffset = bits::nbytes(aggregates_.size());

// Allocate space for the null and initialized flags.
int32_t rowSizeOffset =
bits::nbytes(aggregates_.size() * RowContainer::kNumAccumulatorFlags);
int32_t offset = rowSizeOffset + sizeof(int32_t);
int32_t accumulatorFlagsOffset = 0;
int32_t alignment = 1;
Expand Down
24 changes: 24 additions & 0 deletions velox/exec/tests/AggregationTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -589,6 +589,30 @@ TEST_F(AggregationTest, global) {
"SELECT sum(15), sum(c1), sum(c2), sum(c4), sum(c5), "
"min(15), min(c1), min(c2), min(c3), min(c4), min(c5), "
"max(15), max(c1), max(c2), max(c3), max(c4), max(c5), sum(1) FROM tmp");
}

TEST_F(AggregationTest, manyGlobalAggregations) {
// Test a query with a large number of global aggregations.
// Global aggregations have a separate code path that does not use a
// HashTable, but rather a single row outside of a RowContainer. Having many
// aggregations can expose issues with that single row that may not occur with
// only a few aggregations.
auto rowType =
velox::test::VectorMaker::rowType(std::vector<TypePtr>(100, SMALLINT()));
auto vectors = makeVectors(rowType, 10, 100);
createDuckDbTable(vectors);

std::vector<std::string> aggregates;
for (int i = 0; i < rowType->size(); i++) {
aggregates.push_back(fmt::format("sum({})", rowType->nameOf(i)));
}

auto op = PlanBuilder()
.values(vectors)
.singleAggregation({}, aggregates)
.planNode();

assertQuery(op, "SELECT " + folly::join(", ", aggregates) + " FROM tmp");

EXPECT_EQ(NonPODInt64::constructed, NonPODInt64::destructed);
}
Expand Down

0 comments on commit da15ba5

Please sign in to comment.