-
Notifications
You must be signed in to change notification settings - Fork 85
support DynamicFilterPhysicalExpr #4961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
support DynamicFilterPhysicalExpr #4961
Conversation
| if is_dynamic_physical_expr(&e) { | ||
| e.snapshot().ok().flatten() | ||
| } else { | ||
| Some(Arc::clone(e)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's possible or worth it but you could do this more lazily, or wrap the DataFusion dynamic filter in a Vortex dynamic filter and call DynamicFilterPhysicalExpr::current() or PhysicalExpr::snapshot() on each RecordBatch (that would be more overhead but also will kick in sooner on scans that touch very large files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I roughly tried to do that in some old PR, but I don't remember why I stopped pushing it. This seems like a good starting point, and we can experiment with it in the future to see which one is better.
| .ok_or_else(|| { | ||
| vortex_err!("Plan should have 2 DataSourceExec, the second is the probe side") | ||
| })?; | ||
| assert!(data_source_line.contains("DynamicFilterPhysicalExpr [ a@0 >= 1 AND a@0 <= 3 ]")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what other assertions we could make. I don't think Vortex gives any metrics on rows pruned, etc. And the Vortex expression isn't saved anywhere / is not in the plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put in a print statement and can see that an appropriate Vortex predicate is being created, but can't verify that it's doing what's expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I have a better idea here right now, I need to figure out how to improve the overall testability
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
We cannot currently run pre commit benchmark on fork so I created this (#4986) |
|
@adriangb haven't forgotten this PR, just really busy week |
|
I can belive some of the perf impact, but I suspect AWS tonight is just extremely noisy, gotta love the 40% on clickbecnh q0. |
Is there a benchmark or something I'm not seeing? If it's Polar Signal I don't seem to have access. |
|
They ran on this PR, seems like there's an issue to trigger benchmarks on this branch (@joseph-isaacs got something I can look at? Would love to be able to do that) |
|
@AdamGS what's your read on the state of this PR / what can I do to help? It's not clear to me if there are issues with benchmarks, if @joseph-isaacs is planning on taking it over in another PR, if I should rewrite the commits with the licensing info, etc. |
|
You beat me to comment by a minute! |
|
Something is going on with the label check, I'll figure that one out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome 🥳
|
Last benchmarks run - https://github.com/vortex-data/vortex/actions/runs/18684856431 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some big regressions on tpch which we need to address before merging
|
Completely missed that, I'll take a deeper look. |
cc @AdamGS