[SPARK-12546][SQL] Change default number of open parquet files #11308

marmbrus · 2016-02-22T20:04:46Z

A common problem that users encounter with Spark 1.6.0 is that writing to a partitioned parquet table OOMs. The root cause is that parquet allocates a significant amount of memory that is not accounted for by our own mechanisms. As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more.

marmbrus · 2016-02-22T20:04:55Z

/cc @nongli @rxin

rxin · 2016-02-22T20:20:28Z

Do we have any off by one error? (I hope we don't)

rxin · 2016-02-22T20:20:42Z

LGTM

SparkQA · 2016-02-22T23:24:34Z

Test build #51661 has finished for PR 11308 at commit b4da054.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-02-22T23:27:16Z

Merging to master and 1.6

A common problem that users encounter with Spark 1.6.0 is that writing to a partitioned parquet table OOMs. The root cause is that parquet allocates a significant amount of memory that is not accounted for by our own mechanisms. As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more. Author: Michael Armbrust <michael@databricks.com> Closes #11308 from marmbrus/parquetWriteOOM. (cherry picked from commit 173aa94) Signed-off-by: Michael Armbrust <michael@databricks.com>

davies · 2016-02-23T00:42:38Z

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala

@@ -430,7 +430,7 @@ private[spark] object SQLConf {

  val PARTITION_MAX_FILES =
    intConf("spark.sql.sources.maxConcurrentWrites",
-      defaultValue = Some(5),
+      defaultValue = Some(1),


We will have 1+1 writers actually

[SPARK-12546][SQL] Change default number of open parquet files

b4da054

asfgit closed this in 173aa94 Feb 22, 2016

davies reviewed Feb 23, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-12546][SQL] Change default number of open parquet files #11308

[SPARK-12546][SQL] Change default number of open parquet files #11308

Uh oh!

marmbrus commented Feb 22, 2016

Uh oh!

marmbrus commented Feb 22, 2016

Uh oh!

rxin commented Feb 22, 2016

Uh oh!

rxin commented Feb 22, 2016

Uh oh!

SparkQA commented Feb 22, 2016

Uh oh!

marmbrus commented Feb 22, 2016

Uh oh!

davies Feb 23, 2016

Uh oh!

Uh oh!

[SPARK-12546][SQL] Change default number of open parquet files #11308

[SPARK-12546][SQL] Change default number of open parquet files #11308

Uh oh!

Conversation

marmbrus commented Feb 22, 2016

Uh oh!

marmbrus commented Feb 22, 2016

Uh oh!

rxin commented Feb 22, 2016

Uh oh!

rxin commented Feb 22, 2016

Uh oh!

SparkQA commented Feb 22, 2016

Uh oh!

marmbrus commented Feb 22, 2016

Uh oh!

davies Feb 23, 2016

Choose a reason for hiding this comment

Uh oh!

Uh oh!