Skip to content

Resolve UpCast expressions introduced in Delta DMLs #1998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

johanl-db
Copy link
Collaborator

@johanl-db johanl-db commented Aug 22, 2023

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

#1938 changed the casting behavior in MERGE and UPDATE to follow the value of the storeAssignmentPolicy config instead of the ansiEnabled one, making the behavior consistent with INSERT.

This change breaks MERGE and UPDATE operations that contain a cast when storeAssignmentPolicy is set to STRICT, throwing an internal error during analysis.

The cause is the UpCast expression added by PreprocessTableMerge and PreprocessTableUpdate when processing assignments. UpCast is meant to be replaced by a regular cast after performing checks by the ResolveUpCast rule that runs during the resolution phase before PreprocessTableMerge and PreprocessTableUpdate introduce the expression, leaving the cast unresolved.

The fix is to run the ResolveUpCast rule once more after PreprocessTableMerge and PreprocessTableUpdate have run.

How was this patch tested?

Missing tests covering cast behavior for the different values ofstoreAssignmentPolicy for UPDATE and MERGE are added, covering:

  • Invalid implicit cast (string -> boolean), valid implicit cast (string -> int), upcast (int -> long)
  • UPDATE, MERGE
  • storeAssignmentPolicy = LEGACY, ANSI, STRICT

@johanl-db johanl-db requested a review from scottsand-db August 29, 2023 16:46
Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@johanl-db johanl-db removed the request for review from allisonport-db August 30, 2023 07:00
@vkorukanti vkorukanti closed this in 1450475 Sep 6, 2023
@@ -107,6 +107,11 @@ class DeltaSparkSessionExtension extends (SparkSessionExtensions => Unit) {
extensions.injectPostHocResolutionRule { session =>
new PreprocessTableDelete(session.sessionState.conf)
}
// Resolve new UpCast expressions that might have been introduced by [[PreprocessTableUpdate]]
// and [[PreprocessTableMerge]].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johanl-db, in DeltaMergeBuilder we explicitly call PostHocResolveUpCast. The inject Rule only runs automatically for UPDATES and not for MERGE?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MERGE using the Scala Delta table API doesn't go through the analyzer but instead resolves the merge command manually. UPDATE (SQL + Scala/Java/Python) and MERGE (SQL) do run through the regular path and injected rules will be applied.

I'm not very familiar with why we need to resolve the MERGE command manually when using the Scala API, this may be something we can improve in the future

vkorukanti pushed a commit to vkorukanti/delta that referenced this pull request Oct 26, 2023
Follow-up from delta-io#1998 to run the rule `PostHocResolveUpCast` multiple times until all `UpCast` expressions are resolved. With the previous change, it only ran once, failing to resolve nested UpCast expressions.

Added a test covering a problematic case with multiple `UpCast` expressions.

GitOrigin-RevId: de56850c6013ad2b7c4a134917778069ee2f5915
xupefei pushed a commit to xupefei/delta that referenced this pull request Oct 31, 2023
Follow-up from delta-io#1998 to run the rule `PostHocResolveUpCast` multiple times until all `UpCast` expressions are resolved. With the previous change, it only ran once, failing to resolve nested UpCast expressions.

Added a test covering a problematic case with multiple `UpCast` expressions.

GitOrigin-RevId: de56850c6013ad2b7c4a134917778069ee2f5915
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants