Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Nov 18, 2025

Which issue does this PR close?

Rationale for this change

This change adds complete Substrait round‑trip support for GROUPING SET CUBE, allowing logical plans containing cubes to successfully convert to and from Substrait. Previously, cube handling in the Substrait producer returned a hard NotImplemented error, preventing several SQLLogicTest cases from running under round‑trip mode.

Supporting CUBE brings consistency with existing ROLLUP and GROUPING SETS handling, ensures correct logical plan serialization, and enables successful execution of tests that rely on cube semantics.

Before

❯ cargo test --test sqllogictests -- --substrait-round-trip grouping.slt:52
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.60s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-917e139464eeea33)
Completed 1 test files in 0 seconds                                              External error: 1 errors in file /Users/kosiew/GitHub/datafusion/datafusion/sqllogictest/test_files/grouping.slt

1. query failed: DataFusion error: This feature is not implemented: GroupingSet CUBE is not yet supported

After

❯ cargo test --test sqllogictests -- --substrait-round-trip grouping.slt:52
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.65s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-917e139464eeea33)
Completed 1 test files in 0 seconds

What changes are included in this PR?

  • Introduces a shared internal helper powerset_indices for efficient subset generation.
  • Refactors powerset to use DataFusion error types and removes the string‑based error.
  • Adds a new powerset_cloned function for owned‑value subsets needed in the Substrait adapter.
  • Implements full Substrait producer support for GroupingSet::Cube using powerset_cloned.
  • Updates aggregate Substrait translation to correctly assemble grouping sets derived from cube expansions.
  • Adds a new Substrait round‑trip test case for GROUP BY CUBE.

Are these changes tested?

Yes. A new Substrait round‑trip test (aggregate_grouping_cube) validates the logical plan after translation. Existing grouping/aggregate tests continue to pass, covering other grouping‑set variants.

Are there any user-facing changes?

There are no user‑facing API changes. The behavior of GROUP BY CUBE is now consistent under Substrait round‑trip mode, which may allow previously failing queries to succeed.

LLM-generated code disclosure

This PR includes LLM‑generated code and comments. All LLM‑generated content has been manually reviewed and tested.

Remove NotImplemented error for GroupingSet::Cube to
allow Substrait conversion. Add the generate_powerset
function for efficient power set generation using
bit-masking. Include validation to manage memory
usage and refactor error handling for consistency
across GroupingSet variants. Add tests for CUBE
queries and validate with SQLLogicTest.
@github-actions github-actions bot added logical-expr Logical plan and expressions substrait Changes to the substrait crate labels Nov 18, 2025
@kosiew kosiew marked this pull request as ready for review November 18, 2025 12:18
Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kosiew
Copy link
Contributor Author

kosiew commented Nov 19, 2025

@martin-g
Thanks for your review and feedback!

@kosiew
Copy link
Contributor Author

kosiew commented Nov 20, 2025

@gabotechs
Can you review this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] GroupingSet CUBE is not yet supported

2 participants