Skip to content

[SPARK-48498][SQL][FOLLOWUP] do padding for char-char comparison #47412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup of #46832 to handle a missing case: char-char comparison. We should pad both sides if READ_SIDE_CHAR_PADDING is not enabled.

Why are the changes needed?

bug fix if people disable read side char padding

Does this PR introduce any user-facing change?

No because it's a followup and the original PR is not released yet

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

@cloud-fan
Copy link
Contributor Author

cc @yaooqinn

@github-actions github-actions bot added the SQL label Jul 19, 2024
sql(s"CREATE TABLE t2 (c1 CHAR(3), c2 CHAR(5)) USING $format LOCATION '$dir'")
// Comparing CHAR column with CHAR column compares the padded values.
checkAnswer(
sql("SELECT c1 = c2 FROM t2"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test both c1 = c2 and c2 = c1?

Seq(Row(true), Row(true), Row(true), Row(true))
)
checkAnswer(
sql("SELECT c1 IN (c2) FROM t2"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

(rawType, typeWithTargetCharLength) match {
case (CharType(len), CharType(target)) if target > len =>
case (CharType(len), CharType(target)) if alwaysPad || target > len =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need padding if len = target?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to. This is the existing logic. My change is always pad as the CHAR value may not be padded to its declared length.

(rawType, typeWithTargetCharLength) match {
case (CharType(len), CharType(target)) if target > len =>
case (CharType(len), CharType(target)) if alwaysPad || target > len =>
Some(StringRPad(expr, Literal(target)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall be the bigger number here?

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Jul 19, 2024

thanks for review, merging to master/3.5!

@cloud-fan cloud-fan closed this in 3f6e2d6 Jul 19, 2024
cloud-fan added a commit that referenced this pull request Jul 19, 2024
### What changes were proposed in this pull request?

This is a followup of #46832 to handle a missing case: char-char comparison. We should pad both sides if `READ_SIDE_CHAR_PADDING` is not enabled.

### Why are the changes needed?

bug fix if people disable read side char padding

### Does this PR introduce _any_ user-facing change?

No because it's a followup and the original PR is not released yet

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #47412 from cloud-fan/char.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?

This is a followup of apache#46832 to handle a missing case: char-char comparison. We should pad both sides if `READ_SIDE_CHAR_PADDING` is not enabled.

### Why are the changes needed?

bug fix if people disable read side char padding

### Does this PR introduce _any_ user-facing change?

No because it's a followup and the original PR is not released yet

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47412 from cloud-fan/char.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

This is a followup of apache#46832 to handle a missing case: char-char comparison. We should pad both sides if `READ_SIDE_CHAR_PADDING` is not enabled.

### Why are the changes needed?

bug fix if people disable read side char padding

### Does this PR introduce _any_ user-facing change?

No because it's a followup and the original PR is not released yet

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47412 from cloud-fan/char.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?

This is a followup of apache#46832 to handle a missing case: char-char comparison. We should pad both sides if `READ_SIDE_CHAR_PADDING` is not enabled.

### Why are the changes needed?

bug fix if people disable read side char padding

### Does this PR introduce _any_ user-facing change?

No because it's a followup and the original PR is not released yet

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47412 from cloud-fan/char.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants