Skip to content

Conversation

@adriangb
Copy link

cc @AdamGS

Comment on lines +209 to +213
if is_dynamic_physical_expr(&e) {
e.snapshot().ok().flatten()
} else {
Some(Arc::clone(e))
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's possible or worth it but you could do this more lazily, or wrap the DataFusion dynamic filter in a Vortex dynamic filter and call DynamicFilterPhysicalExpr::current() or PhysicalExpr::snapshot() on each RecordBatch (that would be more overhead but also will kick in sooner on scans that touch very large files).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I roughly tried to do that in some old PR, but I don't remember why I stopped pushing it. This seems like a good starting point, and we can experiment with it in the future to see which one is better.

.ok_or_else(|| {
vortex_err!("Plan should have 2 DataSourceExec, the second is the probe side")
})?;
assert!(data_source_line.contains("DynamicFilterPhysicalExpr [ a@0 >= 1 AND a@0 <= 3 ]"));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what other assertions we could make. I don't think Vortex gives any metrics on rows pruned, etc. And the Vortex expression isn't saved anywhere / is not in the plan.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put in a print statement and can see that an appropriate Vortex predicate is being created, but can't verify that it's doing what's expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have a better idea here right now, I need to figure out how to improve the overall testability

@codecov
Copy link

codecov bot commented Oct 15, 2025

Codecov Report

❌ Patch coverage is 95.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.03%. Comparing base (839b7dd) to head (a9e0abb).
⚠️ Report is 91 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-datafusion/src/persistent/mod.rs 92.77% 6 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@joseph-isaacs joseph-isaacs added the performance Release label indicating an improvement to performance label Oct 17, 2025
@joseph-isaacs
Copy link
Contributor

joseph-isaacs commented Oct 17, 2025

We cannot currently run pre commit benchmark on fork so I created this (#4986)

@AdamGS
Copy link
Contributor

AdamGS commented Oct 17, 2025

@adriangb haven't forgotten this PR, just really busy week

@AdamGS
Copy link
Contributor

AdamGS commented Oct 17, 2025

I can belive some of the perf impact, but I suspect AWS tonight is just extremely noisy, gotta love the 40% on clickbecnh q0.

@adriangb
Copy link
Author

I can belive some of the perf impact, but I suspect AWS tonight is just extremely noisy, gotta love the 40% on clickbecnh q0.

Is there a benchmark or something I'm not seeing? If it's Polar Signal I don't seem to have access.

@AdamGS
Copy link
Contributor

AdamGS commented Oct 18, 2025

They ran on this PR, seems like there's an issue to trigger benchmarks on this branch (@joseph-isaacs got something I can look at? Would love to be able to do that)

@adriangb
Copy link
Author

@AdamGS what's your read on the state of this PR / what can I do to help? It's not clear to me if there are issues with benchmarks, if @joseph-isaacs is planning on taking it over in another PR, if I should rewrite the commits with the licensing info, etc.

@AdamGS
Copy link
Contributor

AdamGS commented Oct 21, 2025

You beat me to comment by a minute!
I've re-triggered the benchmarks over at #4986, if you can it'll be great if you can fix the lint failure (just run with nightly) and the DCO thing and we can merge this IMO.

@AdamGS AdamGS added the feature Release label indicating a new feature or request label Oct 21, 2025
@AdamGS
Copy link
Contributor

AdamGS commented Oct 21, 2025

Something is going on with the label check, I'll figure that one out.

Copy link
Contributor

@AdamGS AdamGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome 🥳

@AdamGS
Copy link
Contributor

AdamGS commented Oct 21, 2025

Copy link
Contributor

@joseph-isaacs joseph-isaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some big regressions on tpch which we need to address before merging

@AdamGS
Copy link
Contributor

AdamGS commented Oct 22, 2025

Completely missed that, I'll take a deeper look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Release label indicating a new feature or request performance Release label indicating an improvement to performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants