Skip to content

Commit 8edb64c

Browse files
ueshincloud-fan
authored andcommitted
[SPARK-26060][SQL] Track SparkConf entries and make SET command reject such entries.
## What changes were proposed in this pull request? Currently the `SET` command works without any warnings even if the specified key is for `SparkConf` entries and it has no effect because the command does not update `SparkConf`, but the behavior might confuse users. We should track `SparkConf` entries and make the command reject for such entries. ## How was this patch tested? Added a test and existing tests. Closes #23031 from ueshin/issues/SPARK-26060/set_command. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 66b2046 commit 8edb64c

File tree

5 files changed

+35
-1
lines changed

5 files changed

+35
-1
lines changed

docs/sql-migration-guide-upgrade.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ displayTitle: Spark SQL Upgrading Guide
2929

3030
- In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be udefined.
3131

32+
- In Spark version 2.4 and earlier, the `SET` command works without any warnings even if the specified key is for `SparkConf` entries and it has no effect because the command does not update `SparkConf`, but the behavior might confuse users. Since 3.0, the command fails if a `SparkConf` key is used. You can disable such a check by setting `spark.sql.legacy.execution.setCommandRejectsSparkConfs` to `false`.
33+
3234
## Upgrading From Spark SQL 2.3 to 2.4
3335

3436
- In Spark version 2.3 and earlier, the second parameter to array_contains function is implicitly promoted to the element type of first array type parameter. This type promotion can be lossy and may cause `array_contains` function to return wrong result. This problem has been addressed in 2.4 by employing a safer type promotion mechanism. This can cause some change in behavior and are illustrated in the table below.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ import org.apache.spark.util.Utils
4545

4646
object SQLConf {
4747

48-
private val sqlConfEntries = java.util.Collections.synchronizedMap(
48+
private[sql] val sqlConfEntries = java.util.Collections.synchronizedMap(
4949
new java.util.HashMap[String, ConfigEntry[_]]())
5050

5151
val staticConfKeys: java.util.Set[String] =
@@ -1610,6 +1610,14 @@ object SQLConf {
16101610
""" "... N more fields" placeholder.""")
16111611
.intConf
16121612
.createWithDefault(25)
1613+
1614+
val SET_COMMAND_REJECTS_SPARK_CONFS =
1615+
buildConf("spark.sql.legacy.execution.setCommandRejectsSparkConfs")
1616+
.internal()
1617+
.doc("If it is set to true, SET command will fail when the key is registered as " +
1618+
"a SparkConf entry.")
1619+
.booleanConf
1620+
.createWithDefault(true)
16131621
}
16141622

16151623
/**
@@ -2030,6 +2038,8 @@ class SQLConf extends Serializable with Logging {
20302038

20312039
def maxToStringFields: Int = getConf(SQLConf.MAX_TO_STRING_FIELDS)
20322040

2041+
def setCommandRejectsSparkConfs: Boolean = getConf(SQLConf.SET_COMMAND_REJECTS_SPARK_CONFS)
2042+
20332043
/** ********************** SQLConf functionality methods ************ */
20342044

20352045
/** Set Spark SQL configuration properties. */

sql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,5 +153,9 @@ class RuntimeConfig private[sql](sqlConf: SQLConf = new SQLConf) {
153153
if (SQLConf.staticConfKeys.contains(key)) {
154154
throw new AnalysisException(s"Cannot modify the value of a static config: $key")
155155
}
156+
if (sqlConf.setCommandRejectsSparkConfs &&
157+
ConfigEntry.findEntry(key) != null && !SQLConf.sqlConfEntries.containsKey(key)) {
158+
throw new AnalysisException(s"Cannot modify the value of a Spark config: $key")
159+
}
156160
}
157161
}

sql/core/src/test/scala/org/apache/spark/sql/RuntimeConfigSuite.scala

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
package org.apache.spark.sql
1919

2020
import org.apache.spark.SparkFunSuite
21+
import org.apache.spark.internal.config
2122

2223
class RuntimeConfigSuite extends SparkFunSuite {
2324

@@ -68,4 +69,13 @@ class RuntimeConfigSuite extends SparkFunSuite {
6869
assert(!conf.isModifiable(""))
6970
assert(!conf.isModifiable("invalid config parameter"))
7071
}
72+
73+
test("reject SparkConf entries") {
74+
val conf = newConf()
75+
76+
val ex = intercept[AnalysisException] {
77+
conf.set(config.CPUS_PER_TASK.key, 4)
78+
}
79+
assert(ex.getMessage.contains("Spark config"))
80+
}
7181
}

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ import java.util.Locale
2424
import org.apache.hadoop.fs.Path
2525
import org.scalatest.BeforeAndAfterEach
2626

27+
import org.apache.spark.internal.config
2728
import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode}
2829
import org.apache.spark.sql.catalyst.TableIdentifier
2930
import org.apache.spark.sql.catalyst.analysis.{FunctionRegistry, NoSuchPartitionException, NoSuchTableException, TempTableAlreadyExistsException}
@@ -2715,4 +2716,11 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils {
27152716
}
27162717
}
27172718
}
2719+
2720+
test("set command rejects SparkConf entries") {
2721+
val ex = intercept[AnalysisException] {
2722+
sql(s"SET ${config.CPUS_PER_TASK.key} = 4")
2723+
}
2724+
assert(ex.getMessage.contains("Spark config"))
2725+
}
27182726
}

0 commit comments

Comments
 (0)