Skip to content

Commit e17762f

Browse files
committed
Merge remote-tracking branch 'remotes/origin/master' into millis-2-days-java8-api
# Conflicts: # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
2 parents cb37fe3 + c198620 commit e17762f

File tree

121 files changed

+2612
-2358
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

121 files changed

+2612
-2358
lines changed

core/src/main/scala/org/apache/spark/internal/Logging.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ trait Logging {
117117
}
118118

119119
// For testing
120-
def initializeForcefully(isInterpreter: Boolean, silent: Boolean): Unit = {
120+
private[spark] def initializeForcefully(isInterpreter: Boolean, silent: Boolean): Unit = {
121121
initializeLogging(isInterpreter, silent)
122122
}
123123

core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -502,7 +502,6 @@ private[serializer] object KryoSerializer {
502502
"org.apache.spark.ml.attribute.NumericAttribute",
503503

504504
"org.apache.spark.ml.feature.Instance",
505-
"org.apache.spark.ml.feature.InstanceBlock",
506505
"org.apache.spark.ml.feature.LabeledPoint",
507506
"org.apache.spark.ml.feature.OffsetInstance",
508507
"org.apache.spark.ml.linalg.DenseMatrix",

dev/deps/spark-deps-hadoop-2.7-hive-2.3

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,6 @@ hive-jdbc/2.3.6//hive-jdbc-2.3.6.jar
8787
hive-llap-common/2.3.6//hive-llap-common-2.3.6.jar
8888
hive-metastore/2.3.6//hive-metastore-2.3.6.jar
8989
hive-serde/2.3.6//hive-serde-2.3.6.jar
90-
hive-service-rpc/2.3.6//hive-service-rpc-2.3.6.jar
9190
hive-shims-0.23/2.3.6//hive-shims-0.23-2.3.6.jar
9291
hive-shims-common/2.3.6//hive-shims-common-2.3.6.jar
9392
hive-shims-scheduler/2.3.6//hive-shims-scheduler-2.3.6.jar

dev/deps/spark-deps-hadoop-3.2-hive-2.3

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,6 @@ hive-jdbc/2.3.6//hive-jdbc-2.3.6.jar
8686
hive-llap-common/2.3.6//hive-llap-common-2.3.6.jar
8787
hive-metastore/2.3.6//hive-metastore-2.3.6.jar
8888
hive-serde/2.3.6//hive-serde-2.3.6.jar
89-
hive-service-rpc/2.3.6//hive-service-rpc-2.3.6.jar
9089
hive-shims-0.23/2.3.6//hive-shims-0.23-2.3.6.jar
9190
hive-shims-common/2.3.6//hive-shims-common-2.3.6.jar
9291
hive-shims-scheduler/2.3.6//hive-shims-scheduler-2.3.6.jar

dev/sparktestsupport/modules.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -364,7 +364,6 @@ def __hash__(self):
364364
"pyspark.sql.avro.functions",
365365
"pyspark.sql.pandas.conversion",
366366
"pyspark.sql.pandas.map_ops",
367-
"pyspark.sql.pandas.functions",
368367
"pyspark.sql.pandas.group_ops",
369368
"pyspark.sql.pandas.types",
370369
"pyspark.sql.pandas.serializers",

docs/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
sql-configs.html

docs/configuration.md

Lines changed: 7 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2399,47 +2399,15 @@ the driver or executor, or, in the absence of that value, the number of cores av
23992399
Please refer to the [Security](security.html) page for available options on how to secure different
24002400
Spark subsystems.
24012401

2402-
### Spark SQL
2403-
2404-
Running the <code>SET -v</code> command will show the entire list of the SQL configuration.
2405-
2406-
<div class="codetabs">
2407-
<div data-lang="scala" markdown="1">
24082402

2409-
{% highlight scala %}
2410-
// spark is an existing SparkSession
2411-
spark.sql("SET -v").show(numRows = 200, truncate = false)
2412-
{% endhighlight %}
2413-
2414-
</div>
2415-
2416-
<div data-lang="java" markdown="1">
2417-
2418-
{% highlight java %}
2419-
// spark is an existing SparkSession
2420-
spark.sql("SET -v").show(200, false);
2421-
{% endhighlight %}
2422-
</div>
2423-
2424-
<div data-lang="python" markdown="1">
2425-
2426-
{% highlight python %}
2427-
# spark is an existing SparkSession
2428-
spark.sql("SET -v").show(n=200, truncate=False)
2429-
{% endhighlight %}
2430-
2431-
</div>
2432-
2433-
<div data-lang="r" markdown="1">
2434-
2435-
{% highlight r %}
2436-
sparkR.session()
2437-
properties <- sql("SET -v")
2438-
showDF(properties, numRows = 200, truncate = FALSE)
2439-
{% endhighlight %}
2403+
{% for static_file in site.static_files %}
2404+
{% if static_file.name == 'sql-configs.html' %}
2405+
### Spark SQL
24402406

2441-
</div>
2442-
</div>
2407+
{% include_relative sql-configs.html %}
2408+
{% break %}
2409+
{% endif %}
2410+
{% endfor %}
24432411

24442412

24452413
### Spark Streaming

docs/sql-migration-guide.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ license: |
101101

102102
- Since Spark 3.0, if files or subdirectories disappear during recursive directory listing (i.e. they appear in an intermediate listing but then cannot be read or listed during later phases of the recursive directory listing, due to either concurrent file deletions or object store consistency issues) then the listing will fail with an exception unless `spark.sql.files.ignoreMissingFiles` is `true` (default `false`). In previous versions, these missing files or subdirectories would be ignored. Note that this change of behavior only applies during initial table file listing (or during `REFRESH TABLE`), not during query execution: the net change is that `spark.sql.files.ignoreMissingFiles` is now obeyed during table file listing / query planning, not only at query execution time.
103103

104-
- Since Spark 3.0, substitution order of nested WITH clauses is changed and an inner CTE definition takes precedence over an outer. In version 2.4 and earlier, `WITH t AS (SELECT 1), t2 AS (WITH t AS (SELECT 2) SELECT * FROM t) SELECT * FROM t2` returns `1` while in version 3.0 it returns `2`. The previous behaviour can be restored by setting `spark.sql.legacy.ctePrecedence.enabled` to `true`.
104+
- Since Spark 3.0, Spark throws an AnalysisException if name conflict is detected in the nested WITH clause by default. It forces the users to choose the specific substitution order they wanted, which is controlled by `spark.sql.legacy.ctePrecedence.enabled`. If set to false (which is recommended), inner CTE definitions take precedence over outer definitions. For example, set the config to `false`, `WITH t AS (SELECT 1), t2 AS (WITH t AS (SELECT 2) SELECT * FROM t) SELECT * FROM t2` returns `2`, while setting it to `true`, the result is `1` which is the behavior in version 2.4 and earlier.
105105

106106
- Since Spark 3.0, the `add_months` function does not adjust the resulting date to a last day of month if the original date is a last day of months. For example, `select add_months(DATE'2019-02-28', 1)` results `2019-03-28`. In Spark version 2.4 and earlier, the resulting date is adjusted when the original date is a last day of months. For example, adding a month to `2019-02-28` results in `2019-03-31`.
107107

@@ -215,6 +215,8 @@ license: |
215215
For example `SELECT timestamp 'tomorrow';`.
216216

217217
- Since Spark 3.0, the `size` function returns `NULL` for the `NULL` input. In Spark version 2.4 and earlier, this function gives `-1` for the same input. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.sizeOfNull` to `true`.
218+
219+
- Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`.
218220

219221
- Since Spark 3.0, the interval literal syntax does not allow multiple from-to units anymore. For example, `SELECT INTERVAL '1-1' YEAR TO MONTH '2-2' YEAR TO MONTH'` throws parser exception.
220222

@@ -326,7 +328,7 @@ license: |
326328

327329
- Since Spark 3.0, `SHOW TBLPROPERTIES` will cause `AnalysisException` if the table does not exist. In Spark version 2.4 and earlier, this scenario caused `NoSuchTableException`. Also, `SHOW TBLPROPERTIES` on a temporary view will cause `AnalysisException`. In Spark version 2.4 and earlier, it returned an empty result.
328330

329-
- Since Spark 3.0, `SHOW CREATE TABLE` will always return Spark DDL, even when the given table is a Hive serde table. For Hive DDL, please use `SHOW CREATE TABLE AS SERDE` command instead.
331+
- Since Spark 3.0, `SHOW CREATE TABLE` will always return Spark DDL, even when the given table is a Hive serde table. For generating Hive DDL, please use `SHOW CREATE TABLE AS SERDE` command instead.
330332

331333
- Since Spark 3.0, we upgraded the built-in Hive from 1.2 to 2.3. This may need to set `spark.sql.hive.metastore.version` and `spark.sql.hive.metastore.jars` according to the version of the Hive metastore.
332334
For example: set `spark.sql.hive.metastore.version` to `1.2.1` and `spark.sql.hive.metastore.jars` to `maven` if your Hive metastore version is 1.2.1.

docs/sql-performance-tuning.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ that these options will be deprecated in future release as more optimizations ar
6767
<td>134217728 (128 MB)</td>
6868
<td>
6969
The maximum number of bytes to pack into a single partition when reading files.
70+
This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.
7071
</td>
7172
</tr>
7273
<tr>
@@ -76,7 +77,8 @@ that these options will be deprecated in future release as more optimizations ar
7677
The estimated cost to open a file, measured by the number of bytes could be scanned in the same
7778
time. This is used when putting multiple files into a partition. It is better to over-estimated,
7879
then the partitions with small files will be faster than partitions with bigger files (which is
79-
scheduled first).
80+
scheduled first). This configuration is effective only when using file-based sources such as Parquet,
81+
JSON and ORC.
8082
</td>
8183
</tr>
8284
<tr>

0 commit comments

Comments
 (0)