Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive CTEs: Stage 2 - add support for sql -> logical plan generation #8839

Merged
merged 11 commits into from
Jan 19, 2024

Conversation

matthewgapp
Copy link
Contributor

@matthewgapp matthewgapp commented Jan 12, 2024

This PR implements sql -> logical plan generation support for Recursive CTEs. Builds on top of #8828.

The logical plan assumes that execution is performed iteratively, using a worktable to write to and read from the intermediate results during the recursive CTE's execution.

Todos

  • Fill out docs and cleanup. Add more tests.

Which issue does this PR close?

Part of closing #462.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate labels Jan 12, 2024
@matthewgapp matthewgapp force-pushed the matt/feat/recursive-ctes/parser branch 2 times, most recently from 98f77ff to 531d541 Compare January 12, 2024 06:08
@matthewgapp matthewgapp force-pushed the matt/feat/recursive-ctes/parser branch from 531d541 to 12d401a Compare January 14, 2024 00:13
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 14, 2024
@matthewgapp matthewgapp force-pushed the matt/feat/recursive-ctes/parser branch 2 times, most recently from a3b6463 to 22661af Compare January 15, 2024 21:57
@matthewgapp matthewgapp marked this pull request as ready for review January 15, 2024 22:06
@jackwener jackwener self-requested a review January 16, 2024 03:36
testing Outdated Show resolved Hide resolved
update docs from script

update slt test for doc change
* impl cte as work table

* move SharedState to continuance

* impl WorkTableState

wip: readying pr to implement only logical plan

fix sql integration test

wip: add sql test for logical plan

wip: format test assertion
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Jan 18, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @matthewgapp -- I think this PR is basically ready to go. I took the liberty of merging up from main to resolve a conflict and some more comments in 675a6df

I also have a few suggestions about error messages and naming but I don't think any of them are required on this PR (we could potentially do them (or not) as a follow on PR)

format!("{plan:?}"),
"Projection: numbers.n\
\n SubqueryAlias: numbers\
\n RecursiveQuery: is_distinct=false\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

datafusion/sql/src/planner.rs Show resolved Hide resolved
datafusion/sql/src/query.rs Outdated Show resolved Hide resolved
datafusion/expr/src/logical_plan/builder.rs Show resolved Hide resolved
pub recursive_term: Arc<LogicalPlan>,
/// Should the output of the recursive term be deduplicated (`UNION`) or
/// not (`UNION ALL`).
pub is_distinct: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using the term all would be easier here as it more closely matches the CTE Syntax (UNON vs UNION ALL)

Something like

Suggested change
pub is_distinct: bool,
pub all: bool,

Copy link
Contributor Author

@matthewgapp matthewgapp Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about dedupe? Regardless, I would prefer if we made this naming change in a follow-on PR once we merge this and the execution PR.

@matthewgapp
Copy link
Contributor Author

Thank you @matthewgapp -- I think this PR is basically ready to go. I took the liberty of merging up from main to resolve a conflict and some more comments in 675a6df

I also have a few suggestions about error messages and naming but I don't think any of them are required on this PR (we could potentially do them (or not) as a follow on PR)

Thanks for the review and for layering in those tweaks. I'll layer in changes based on your comments real quick and then should be good 2 go. 🚢

@alamb
Copy link
Contributor

alamb commented Jan 18, 2024

FYI @jonahgao

@alamb alamb merged commit a786921 into apache:main Jan 19, 2024
22 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 19, 2024

Onwards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants