-
Notifications
You must be signed in to change notification settings - Fork 4
fix: Add partial support for recursive queries instrumentation #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b6be176
to
e8120ee
Compare
e8120ee
to
2ca3fea
Compare
2ca3fea
to
5d0b88e
Compare
This change introduces partial support for tracing and metrics instrumentation of recursive queries in DataFusion. Recursive queries now spawn a new span group for each recursive call, as the recursive implementation recreates the recursive nodes for each loop iteration. This is enabled by supporting the `with_new_inner` method on `InstrumentedExec`, which allows the optimizer to rebuild instrumented plans as needed. However, due to current limitations in DataFusion, nodes of type `WorkTableExec` are not instrumented, as wrapping them would break recursive query execution. This limitation will be revisited once upstream changes allow `WorkTableExec` to be safely instrumented. See also: apache/datafusion#16469 and related discussions.
5d0b88e
to
98bc696
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR partially addresses the issue of recursive query instrumentation by avoiding the wrapping of WorkTableExec nodes with InstrumentedExec, thereby preventing the “Unexpected empty work table” error. Key changes include:
- Adding a with_new_inner method to InstrumentedExec to rebuild instrumented plans.
- Updating the instrumentation rule to skip WorkTableExec nodes.
- Introducing new integration tests and snapshot updates for recursive query scenarios.
Reviewed Changes
Copilot reviewed 28 out of 30 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/snapshots/*.snap | Added new snapshots for various execution plans to capture recursive query behavior. |
tests/integration_tests.rs | Introduced new async tests to verify recursive query execution and instrumentation. |
integration-utils/queries/recursive.sql | Added a recursive SQL query to exercise recursive query instrumentation. |
datafusion-tracing/src/lib.rs | Updated documentation to explain limitations with WorkTableExec instrumentation. |
datafusion-tracing/src/instrumented.rs | Added the with_new_inner method and new delegate methods to rewrap inner plans. |
datafusion-tracing/src/instrument_rule.rs | Modified the PhysicalOptimizerRule to skip wrapping WorkTableExec nodes. |
README.md | Updated documentation regarding the known recursive query limitations. |
Which issue does this PR close?
Rationale for this change
Recursive queries previously failed with
Unexpected empty work table
when instrumentation was enabled, due to the wayInstrumentedExec
wrapped all nodes, includingWorkTableExec
, which is not safe to instrument. This PR introduces a partial fix that allows recursive queries to be instrumented, while explicitly avoiding instrumentation ofWorkTableExec
nodes. This enables tracing and metrics for recursive queries, even though full support is pending upstream changes.What changes are included in this PR?
with_new_inner
method onInstrumentedExec
, allowing the optimizer to rebuild instrumented plans as needed for recursive queries.WorkTableExec
nodes, preventing the "Unexpected empty work table" error.Are these changes tested?
Yes, new and updated integration tests and snapshots are included to verify that recursive queries are now instrumented correctly and do not fail with the previous error. The tests cover both the presence of instrumentation and the correct execution of recursive queries.
Are there any user-facing changes?
Yes:
WorkTableExec
are not instrumented, so some internal operations in recursive queries will not be traced until upstream support is available.