-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Support Substrait Round-Trip of EmptyRelation Including produce_one_row Semantics
#18842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Transform EmptyRelation plans to produce a null-typed row as Substrait virtual tables. This change allows Substrait consumption to recognize zero-column virtual tables with one row as placeholder EmptyRelation plans, restoring produce_one_row semantics on roundtrip. Added a test for literal queries without a FROM clause to ensure correct behavior.
Add #[allow(deprecated)] for values field in Producer initializations. Maintain usage of deprecated values field due to incomplete consumer support for new expressions field. Enhance Consumer compatibility by updating produce_one_row logic to accommodate both field patterns and handling for future-proof expressions field usage.
Add detailed function-level documentation for produce_one_row, covering its purpose, usage scenarios, default values context, and rationale for deprecated fields. Improve pattern detection logic in read_rel.rs with clear explanations. Expand test coverage in roundtrip_logical_plan.rs with four new tests ensuring correct serialization/deserialization for EmptyRelation and related scenarios. Verify improvements to sqllogictest functionality.
…tualTable literals
| // A VirtualTable with exactly one row containing only empty/default fields represents | ||
| // an EmptyRelation with produce_one_row=true. This pattern is used for queries without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Oracle the "one row (dummy) table" is called the "dual table", by checking the "In other database systems" section it seems to be quite a well-known name for that concept.
I am not suggesting to necessarily adopt this name here, but maybe a reference to it in the comments could be good for people with a DB background, but not necessarily familiar with Datafusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a reference to the dual table concept - figured it might help folks coming FROM DUAL backgrounds 😉 Thanks for the nudge!
…ion and its similarity to SQL "DUAL" table
asolimando
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice improvement, glad to see better support for EmptyRelation as the optimizer can expand it further to bigger and bigger subplans and reduce the useless work at execution time
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Which issue does this PR close?
Rationale for this change
This change resolves a long‑standing gap in DataFusion’s Substrait round‑trip implementation where
EmptyRelationwithproduce_one_row=truecould not be encoded or decoded. This limitation caused ~1800 sqllogictest cases to fail, particularly those involving queries without a FROM clause (e.g.,SELECT 1).By adding full support for recognizing and producing the Substrait VirtualTable pattern representing a “phantom row,” DataFusion can now faithfully round‑trip logical plans that use empty relations to provide scalar evaluation contexts. This unblocks broader Substrait compatibility and improves consistency across logical plan conversions.
What changes are included in this PR?
Implement detection of the Substrait VirtualTable patterns representing
produce_one_rowand map them toLogicalPlan::EmptyRelation.Add
from_empty_relationencoding logic that emits a properly structured VirtualTable, including default literal values whenproduce_one_row=true.Refactor literal row conversion into a helper (
convert_literal_rows) for clarity and reuse.Improve field‑count validation for expression‑based VirtualTables.
Add comprehensive round‑trip test coverage:
Are these changes tested?
Yes. New integration tests exercise all permutations of EmptyRelation encoding and decoding, including edge cases related to schema handling and subqueries. These tests ensure round‑trip correctness and prevent regressions.
Are there any user-facing changes?
No user‑facing API changes. This PR improves Substrait interoperability internally.
LLM-generated code disclosure
This PR includes LLM‑generated code and comments. All generated content has been manually reviewed and tested.