-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-48498][SQL] Always do char padding in predicates #46832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4603,6 +4603,14 @@ object SQLConf { | |
.booleanConf | ||
.createWithDefault(true) | ||
|
||
val LEGACY_NO_CHAR_PADDING_IN_PREDICATE = buildConf("spark.sql.legacy.noCharPaddingInPredicate") | ||
.internal() | ||
.doc("When true, Spark will not apply char type padding for CHAR type columns in string " + | ||
s"comparison predicates, when '${READ_SIDE_CHAR_PADDING.key}' is false.") | ||
.version("4.0.0") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. QQ: Is it needed by 3.5 or earlier? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ideally yes, but it's only a problem if people turn off READ_SIDE_CHAR_PADDING, so not a bug by default, and I don't have a strong preference on backporting. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We met SPARK-48562 in Spark 3.5, which is a bug for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. Feel free to create a backport PR and I can help merge it. |
||
.booleanConf | ||
.createWithDefault(false) | ||
|
||
val CLI_PRINT_HEADER = | ||
buildConf("spark.sql.cli.print.header") | ||
.doc("When set to true, spark-sql CLI prints the names of the columns in query output.") | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -942,6 +942,34 @@ class FileSourceCharVarcharTestSuite extends CharVarcharTestSuite with SharedSpa | |
} | ||
} | ||
} | ||
|
||
test("SPARK-48498: always do char padding in predicates") { | ||
import testImplicits._ | ||
withSQLConf(SQLConf.READ_SIDE_CHAR_PADDING.key -> "false") { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If set key if not set this key to true, maybe can not get data when char column compare to liter, because liter would be padded but char column not, any idea? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
withTempPath { dir => | ||
withTable("t") { | ||
Seq( | ||
"12" -> "12", | ||
"12" -> "12 ", | ||
"12 " -> "12", | ||
"12 " -> "12 " | ||
).toDF("c1", "c2").write.format(format).save(dir.toString) | ||
sql(s"CREATE TABLE t (c1 CHAR(3), c2 STRING) USING $format LOCATION '$dir'") | ||
// Comparing CHAR column with STRING column directly compares the stored value. | ||
checkAnswer( | ||
sql("SELECT c1 = c2 FROM t"), | ||
Seq(Row(true), Row(false), Row(false), Row(true)) | ||
) | ||
// No matter the CHAR type value is padded or not in the storage, we should always pad it | ||
// before comparison with STRING literals. | ||
checkAnswer( | ||
sql("SELECT c1 = '12', c1 = '12 ', c1 = '12 ' FROM t WHERE c2 = '12'"), | ||
Seq(Row(true, true, true), Row(true, true, true)) | ||
) | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
class DSV2CharVarcharTestSuite extends CharVarcharTestSuite | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the legacy behavior is always padding, how about
LEGACY_CHAR_PADDING_IN_PREDICATE
and change the default value to true ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
legacy behavior is no padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, there is a bug in Spark that we always pad the string literal when comparing CHAR type and STRING literals, which assumes the CHAR type columns are always padded, either on the write side or read side.
I'm a bit confused.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me make it clear. It was talking about when people disabling
spark.sql.readSideCharPadding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you.