-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: deserialization error for FilterExec (predicates with inlist)
#17224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I check the code, the FliterExec predicate's column index is based on FilterExec's input schema, so when we deserialization the FilterExec, the predicate should also based on input schema. datafusion/datafusion/physical-plan/src/filter.rs Lines 90 to 116 in b84ddfd
and i also check the disscussion in #16744 and #16665, not sure why previous issue happen. maybe some code have problem before, but someone fix it recently |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @haohuaijin -- this is my favorite kind of bug fix (fix bug by deleting code!)
cc @XiangpengHao and @NGA-TRAN
| .as_ref() | ||
| .map(|expr| { | ||
| parse_physical_expr(expr, ctx, predicate_schema.as_ref(), extension_codec) | ||
| parse_physical_expr(expr, ctx, input.schema().as_ref(), extension_codec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the input schema always matches my understanding of how the FilterExec work 👍
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified this test covers the code as it fails without the fix
---- cases::roundtrip_physical_plan::test_tpch_part_in_list_query_with_real_parquet_data stdout ----
Error: Internal("PhysicalExpr Column references column 'p_size' at index 1 (zero-based) but input schema only has 1 columns: [\"p_size\"]")
FilterExec (predicates with inlist)
|
My understanding of this fix is that it could affect any predicate, not just |
|
Thank you @alamb |
|
Thanks again @haohuaijin |

Which issue does this PR close?
Rationale for this change
fix #17225
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?