Commit 7a24541
[SPARK-4386] Improve performance when writing Parquet files
Convert type of RowWriteSupport.attributes to Array.
Analysis of performance for writing very wide tables shows that time is spent predominantly in apply method on attributes var. Type of attributes previously was LinearSeqOptimized and apply is O(N) which made write O(N squared).
Measurements on 575 column table showed this change made a 6x improvement in write times.
Author: Michael Davies <Michael.BellDavies@gmail.com>
Closes #3843 from MickDavies/SPARK-4386 and squashes the following commits:
892519d [Michael Davies] [SPARK-4386] Improve performance when writing Parquet files
(cherry picked from commit 7425bec)
Signed-off-by: Michael Armbrust <michael@databricks.com>1 parent cde8a31 commit 7a24541
File tree
1 file changed
+2
-2
lines changed- sql/core/src/main/scala/org/apache/spark/sql/parquet
1 file changed
+2
-2
lines changedLines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
133 | | - | |
| 133 | + | |
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
141 | | - | |
| 141 | + | |
142 | 142 | | |
143 | 143 | | |
144 | 144 | | |
| |||
0 commit comments