Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Nov 20, 2025

Which issue does this PR close?

Rationale for this change

This change resolves a long‑standing gap in DataFusion’s Substrait round‑trip implementation where EmptyRelation with produce_one_row=true could not be encoded or decoded. This limitation caused ~1800 sqllogictest cases to fail, particularly those involving queries without a FROM clause (e.g., SELECT 1).

By adding full support for recognizing and producing the Substrait VirtualTable pattern representing a “phantom row,” DataFusion can now faithfully round‑trip logical plans that use empty relations to provide scalar evaluation contexts. This unblocks broader Substrait compatibility and improves consistency across logical plan conversions.

What changes are included in this PR?

  • Implement detection of the Substrait VirtualTable patterns representing produce_one_row and map them to LogicalPlan::EmptyRelation.

  • Add from_empty_relation encoding logic that emits a properly structured VirtualTable, including default literal values when produce_one_row=true.

  • Refactor literal row conversion into a helper (convert_literal_rows) for clarity and reuse.

  • Improve field‑count validation for expression‑based VirtualTables.

  • Add comprehensive round‑trip test coverage:

    • SELECT without FROM
    • Mixed‑type EmptyRelation with phantom row
    • EmptyRelation with zero rows
    • Subqueries involving EmptyRelation

Are these changes tested?

Yes. New integration tests exercise all permutations of EmptyRelation encoding and decoding, including edge cases related to schema handling and subqueries. These tests ensure round‑trip correctness and prevent regressions.

Are there any user-facing changes?

No user‑facing API changes. This PR improves Substrait interoperability internally.

LLM-generated code disclosure

This PR includes LLM‑generated code and comments. All generated content has been manually reviewed and tested.

Transform EmptyRelation plans to produce a null-typed row
as Substrait virtual tables. This change allows Substrait
consumption to recognize zero-column virtual tables with
one row as placeholder EmptyRelation plans, restoring
produce_one_row semantics on roundtrip. Added a test
for literal queries without a FROM clause to ensure
correct behavior.
Add #[allow(deprecated)] for values field in Producer
initializations. Maintain usage of deprecated values field due
to incomplete consumer support for new expressions field.

Enhance Consumer compatibility by updating produce_one_row
logic to accommodate both field patterns and handling for
future-proof expressions field usage.
Add detailed function-level documentation for produce_one_row,
covering its purpose, usage scenarios, default values context,
and rationale for deprecated fields. Improve pattern detection
logic in read_rel.rs with clear explanations.

Expand test coverage in roundtrip_logical_plan.rs with
four new tests ensuring correct serialization/deserialization
for EmptyRelation and related scenarios. Verify improvements
to sqllogictest functionality.
@github-actions github-actions bot added the substrait Changes to the substrait crate label Nov 20, 2025
Comment on lines +125 to +126
// A VirtualTable with exactly one row containing only empty/default fields represents
// an EmptyRelation with produce_one_row=true. This pattern is used for queries without
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Oracle the "one row (dummy) table" is called the "dual table", by checking the "In other database systems" section it seems to be quite a well-known name for that concept.

I am not suggesting to necessarily adopt this name here, but maybe a reference to it in the comments could be good for people with a DB background, but not necessarily familiar with Datafusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reference to the dual table concept - figured it might help folks coming FROM DUAL backgrounds 😉 Thanks for the nudge!

Copy link
Member

@asolimando asolimando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice improvement, glad to see better support for EmptyRelation as the optimizer can expand it further to bigger and bigger subplans and reduce the useless work at execution time

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] Unsupported producing row from empty relation

3 participants