-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-29947][SQL][followup] ResolveRelations should return relations with fresh attribute IDs #28717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
also cc @HyukjinKwon @brkyvz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks resonable changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #123472 has finished for PR 28717 at commit
|
retest this please |
Test build #123485 has finished for PR 28717 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Can you please add unit test? A triple self-join should be able to reproduce this issue
@@ -1024,7 +1024,12 @@ class Analyzer( | |||
DataSourceV2Relation.create(table, Some(catalog), Some(ident))) | |||
} | |||
val key = catalog.name +: ident.namespace :+ ident.name | |||
Option(AnalysisContext.get.relationCache.getOrElseUpdate(key, loaded.orNull)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just:
Option(AnalysisContext.get.relationCache.getOrElseUpdate(key, loaded.orNull)).map { rel =>
rel.transform { case multi: MultiInstanceRelation => multi.newInstance() }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry just saw your comment after running the merge script.
We only need to refresh the attributes if we get the relation from the cache. Otherwise, it's already a fresh relation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is correct, but IMO it's a little cleaner. Creating new attributes should be super cheap, and not worth having a orElses and foreachs.
thanks, merging to master/3.0! |
… with fresh attribute IDs ### What changes were proposed in this pull request? This is a followup of #26589, which caches the table relations to speed up the table lookup. However, it brings some side effects: the rule `ResolveRelations` may return exactly the same relations, while before it always returns relations with fresh attribute IDs. This PR is to eliminate this side effect. ### Why are the changes needed? There is no bug report yet, but this side effect may impact things like self-join. It's better to restore the 2.4 behavior and always return refresh relations. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #28717 from cloud-fan/fix. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit dc0709f) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@brkyvz I tried triple self-join but can't find a buggy case:
Please let me know if you see some broken join queries. |
what if you do something like:
|
This also works. Please let me know if I miss something, thanks! |
What changes were proposed in this pull request?
This is a followup of #26589, which caches the table relations to speed up the table lookup. However, it brings some side effects: the rule
ResolveRelations
may return exactly the same relations, while before it always returns relations with fresh attribute IDs.This PR is to eliminate this side effect.
Why are the changes needed?
There is no bug report yet, but this side effect may impact things like self-join. It's better to restore the 2.4 behavior and always return refresh relations.
Does this PR introduce any user-facing change?
no
How was this patch tested?
N/A