Skip to content

[SPARK-47511][SQL] Canonicalize With expressions by re-assigning IDs #45649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kelvinjian-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR changes the canonicalization of With expressions to re-assign IDs starting from 0.

Why are the changes needed?

This ensures that the canonicalization of With expressions is consistent.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@kelvinjian-db
Copy link
Contributor Author

@cloud-fan @bersprockets can you help review this PR?

}
}

case class CommonExpressionId(id: Long = CommonExpressionId.newId, canonicalized: Boolean = false) {
Copy link
Contributor

@cloud-fan cloud-fan Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In QueryPlan we have this

  /**
   * A private mutable variable to indicate whether this plan is the result of canonicalization.
   * This is used solely for making sure we wouldn't execute a canonicalized plan.
   * See [[canonicalized]] on how this is set.
   */
  @transient private var _isCanonicalizedPlan: Boolean = false

  protected def isCanonicalizedPlan: Boolean = _isCanonicalizedPlan

Shall we do the same in With?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what replacement are you suggesting? the canonicalized parameter in CommonExpressionId is used to distinguish between an ID that was assigned initially by curId.getAndIncrement() or newly assigned by canonicalization. it is more of a property of the ID itself than the With/CommonExpressionDef/CommonExpressionRef operators? also, if we haven't called .canonicalized on the outermost With, it is possible to have a With expression that contains some canonicalized IDs but some non-canonicalized IDs

…essions/With.scala

Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in b314672 Mar 29, 2024
sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

This PR changes the canonicalization of `With` expressions to re-assign IDs starting from 0.

### Why are the changes needed?

This ensures that the canonicalization of `With` expressions is consistent.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45649 from kelvinjian-db/SPARK-47511-with-canonicalization.

Authored-by: Kelvin Jiang <kelvin.jiang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
dongjoon-hyun pushed a commit that referenced this pull request Apr 5, 2024
…ITH_EXPR to be more general

### What changes were proposed in this pull request?

This is a follow-up of
- #45649

`With` is not only used by `NullIf`, but also `Between`. This PR renames the config `REPLACE_NULLIF_USING_WITH_EXPR` to be more general, and use it to control `Between` as well.

### Why are the changes needed?

have a conf to control all the usages of `With`.

### Does this PR introduce _any_ user-facing change?

no, not released yet

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45871 from cloud-fan/with-conf.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request Feb 12, 2025
### What changes were proposed in this pull request?

This PR aims to add `configName` Scalastyle rule to prevent invalid config names.

### Why are the changes needed?

To prevent repetitive mistake pattern
- #45649
- #48149
- #49121

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Currently, this PR will fail at Scalastyle test because the `master` branch is broken. We can merge this after the following PR.
- #49897

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49900 from dongjoon-hyun/SPARK-51173.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Feb 12, 2025
### What changes were proposed in this pull request?

This PR aims to add `configName` Scalastyle rule to prevent invalid config names.

### Why are the changes needed?

To prevent repetitive mistake pattern
- #45649
- #48149
- #49121

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Currently, this PR will fail at Scalastyle test because the `master` branch is broken. We can merge this after the following PR.
- #49897

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49900 from dongjoon-hyun/SPARK-51173.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 274dc5e)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Feb 23, 2025
This PR aims to add `configName` Scalastyle rule to prevent invalid config names.

To prevent repetitive mistake pattern
- #45649
- #48149
- #49121

No.

Currently, this PR will fail at Scalastyle test because the `master` branch is broken. We can merge this after the following PR.
- #49897

No.

Closes #49900 from dongjoon-hyun/SPARK-51173.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 274dc5e)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants