Skip to content

fix: Add partial support for recursive queries instrumentation #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2025

Conversation

geoffreyclaude
Copy link
Collaborator

Which issue does this PR close?

  • Partially closes #5

Rationale for this change

Recursive queries previously failed with Unexpected empty work table when instrumentation was enabled, due to the way InstrumentedExec wrapped all nodes, including WorkTableExec, which is not safe to instrument. This PR introduces a partial fix that allows recursive queries to be instrumented, while explicitly avoiding instrumentation of WorkTableExec nodes. This enables tracing and metrics for recursive queries, even though full support is pending upstream changes.

What changes are included in this PR?

  • Adds support for the with_new_inner method on InstrumentedExec, allowing the optimizer to rebuild instrumented plans as needed for recursive queries.
  • Updates the instrumentation rule to skip wrapping WorkTableExec nodes, preventing the "Unexpected empty work table" error.
  • Recursive queries now spawn a new span group for each recursive call, as the recursive implementation recreates the recursive nodes for each loop iteration (see discussion).
  • Adds/updates tests and snapshots to cover recursive query instrumentation.
  • Documents the current limitation and references the relevant upstream DataFusion issue (apache/datafusion#16469).

Are these changes tested?

Yes, new and updated integration tests and snapshots are included to verify that recursive queries are now instrumented correctly and do not fail with the previous error. The tests cover both the presence of instrumentation and the correct execution of recursive queries.

Are there any user-facing changes?

Yes:

  • Recursive queries are now partially instrumented for tracing and metrics.
  • There is a known limitation: nodes of type WorkTableExec are not instrumented, so some internal operations in recursive queries will not be traced until upstream support is available.
  • No breaking changes to public APIs.

Copilot

This comment was marked as outdated.

@datafusion-contrib datafusion-contrib deleted a comment from Copilot AI Jun 20, 2025
@geoffreyclaude geoffreyclaude requested a review from Copilot June 20, 2025 09:35
Copilot

This comment was marked as outdated.

Copilot

This comment was marked as outdated.

Copilot

This comment was marked as outdated.

This change introduces partial support for tracing and metrics instrumentation of recursive queries in DataFusion. Recursive queries now spawn a new span group for each recursive call, as the recursive implementation recreates the recursive nodes for each loop iteration. This is enabled by supporting the `with_new_inner` method on `InstrumentedExec`, which allows the optimizer to rebuild instrumented plans as needed.

However, due to current limitations in DataFusion, nodes of type `WorkTableExec` are not instrumented, as wrapping them would break recursive query execution. This limitation will be revisited once upstream changes allow `WorkTableExec` to be safely instrumented.

See also: apache/datafusion#16469 and related discussions.
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR partially addresses the issue of recursive query instrumentation by avoiding the wrapping of WorkTableExec nodes with InstrumentedExec, thereby preventing the “Unexpected empty work table” error. Key changes include:

  • Adding a with_new_inner method to InstrumentedExec to rebuild instrumented plans.
  • Updating the instrumentation rule to skip WorkTableExec nodes.
  • Introducing new integration tests and snapshot updates for recursive query scenarios.

Reviewed Changes

Copilot reviewed 28 out of 30 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/snapshots/*.snap Added new snapshots for various execution plans to capture recursive query behavior.
tests/integration_tests.rs Introduced new async tests to verify recursive query execution and instrumentation.
integration-utils/queries/recursive.sql Added a recursive SQL query to exercise recursive query instrumentation.
datafusion-tracing/src/lib.rs Updated documentation to explain limitations with WorkTableExec instrumentation.
datafusion-tracing/src/instrumented.rs Added the with_new_inner method and new delegate methods to rewrap inner plans.
datafusion-tracing/src/instrument_rule.rs Modified the PhysicalOptimizerRule to skip wrapping WorkTableExec nodes.
README.md Updated documentation regarding the known recursive query limitations.

@geoffreyclaude geoffreyclaude merged commit db235fd into main Jun 20, 2025
4 checks passed
@geoffreyclaude geoffreyclaude deleted the fix/recursive_query branch June 20, 2025 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recursive queries fail with Unexpected empty work table. with instrumentation
1 participant