Skip to content

Conversation

@haohuaijin
Copy link
Contributor

@haohuaijin haohuaijin commented Aug 18, 2025

Which issue does this PR close?

Rationale for this change

fix #17225

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the proto Related to proto crate label Aug 18, 2025
@haohuaijin
Copy link
Contributor Author

haohuaijin commented Aug 18, 2025

I check the code, the FliterExec predicate's column index is based on FilterExec's input schema, so when we deserialization the FilterExec, the predicate should also based on input schema.

pub fn try_new(
predicate: Arc<dyn PhysicalExpr>,
input: Arc<dyn ExecutionPlan>,
) -> Result<Self> {
match predicate.data_type(input.schema().as_ref())? {
DataType::Boolean => {
let default_selectivity = FILTER_EXEC_DEFAULT_SELECTIVITY;
let cache = Self::compute_properties(
&input,
&predicate,
default_selectivity,
None,
)?;
Ok(Self {
predicate,
input: Arc::clone(&input),
metrics: ExecutionPlanMetricsSet::new(),
default_selectivity,
cache,
projection: None,
})
}
other => {
plan_err!("Filter predicate must return BOOLEAN values, got {other:?}")
}
}
}

and i also check the disscussion in #16744 and #16665, not sure why previous issue happen. maybe some code have problem before, but someone fix it recently

cc @alamb @XiangpengHao

@haohuaijin haohuaijin changed the title reproduce for deserialization inlist fix: deserialization error for inlist Aug 18, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haohuaijin -- this is my favorite kind of bug fix (fix bug by deleting code!)

cc @XiangpengHao and @NGA-TRAN

.as_ref()
.map(|expr| {
parse_physical_expr(expr, ctx, predicate_schema.as_ref(), extension_codec)
parse_physical_expr(expr, ctx, input.schema().as_ref(), extension_codec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the input schema always matches my understanding of how the FilterExec work 👍

}

#[tokio::test]
async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this test covers the code as it fails without the fix



---- cases::roundtrip_physical_plan::test_tpch_part_in_list_query_with_real_parquet_data stdout ----
Error: Internal("PhysicalExpr Column references column 'p_size' at index 1 (zero-based) but input schema only has 1 columns: [\"p_size\"]")

@alamb alamb changed the title fix: deserialization error for inlist fix: deserialization error for FilterExec (predicates with inlist) Aug 18, 2025
@alamb
Copy link
Contributor

alamb commented Aug 18, 2025

My understanding of this fix is that it could affect any predicate, not just IN lists, so I updated the PR description to match

@haohuaijin
Copy link
Contributor Author

haohuaijin commented Aug 18, 2025

Thank you @alamb

@alamb
Copy link
Contributor

alamb commented Aug 19, 2025

Screenshot 2025-08-19 at 12 49 49 PM I merged up from main to resolve a conflict

@alamb alamb merged commit 638dc46 into apache:main Aug 19, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 19, 2025

Thanks again @haohuaijin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

regression: inlist deserialization error

2 participants