Skip to content

Conversation

lyne7-sc
Copy link

Why are the changes needed?

The current lineage analysis misses tables that are only referenced in filter condition subqueries (e.g., in WHERE EXISTS or WHERE IN clauses). This PR adds an option to include these upstream dependencies.

Fixes #7206

How was this patch tested?

UT added

Was this patch authored or co-authored using generative AI tooling?

No


case p: Filter =>
if (SparkContextHelper.getConf(
LineageConf.COLLECT_FILTER_CONDITION_TABLES_ENABLED)) {
Copy link
Contributor

@yabola yabola Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we need this conf, it isn't a major behavior change and are always needed. We also found this issue before and have fixed it similar to this, but forgot to report it. Others LGTM.

In addition: this change will not affect column lineage, only affect the input tables, because we set parentColumnsLineage as ListMap[Attribute, AttributeSet]() and doesn't mergeColumnsLineage

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not certain whether filter tables should be treated as upstream tables in all scenarios. Since the existing tests did not account for the existence of these tables, I added this conf to avoid breaking them.

@codecov-commenter
Copy link

codecov-commenter commented Sep 19, 2025

Codecov Report

❌ Patch coverage is 0% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (fc04a3a) to head (f27c083).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...in/lineage/helper/SparkSQLLineageParseHelper.scala 0.00% 14 Missing ⚠️
.../org/apache/spark/kyuubi/lineage/LineageConf.scala 0.00% 5 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7207   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         696     696           
  Lines       43540   43559   +19     
  Branches     5891    5894    +3     
======================================
- Misses      43540   43559   +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][Lineage] Collect tables referenced in filter conditions for lineage analysis
3 participants