Skip to content

Commit fb09a69

Browse files
sameeragarwalrxin
authored andcommitted
[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError
## What changes were proposed in this pull request? We currently don't bound or manage the data array size used by column vectors in the vectorized reader (they're just bound by INT.MAX) which may lead to OOMs while reading data. As a short term fix, this patch intercepts the OutOfMemoryError exception and suggest the user to disable the vectorized parquet reader. ## How was this patch tested? Existing Tests Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14387 from sameeragarwal/oom. (cherry picked from commit 3fd39b8) Signed-off-by: Reynold Xin <rxin@databricks.com>
1 parent f46a074 commit fb09a69

File tree

1 file changed

+19
-5
lines changed

1 file changed

+19
-5
lines changed

sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -282,16 +282,30 @@ public void reserve(int requiredCapacity) {
282282
if (requiredCapacity > capacity) {
283283
int newCapacity = (int) Math.min(MAX_CAPACITY, requiredCapacity * 2L);
284284
if (requiredCapacity <= newCapacity) {
285-
reserveInternal(newCapacity);
285+
try {
286+
reserveInternal(newCapacity);
287+
} catch (OutOfMemoryError outOfMemoryError) {
288+
throwUnsupportedException(newCapacity, requiredCapacity, outOfMemoryError);
289+
}
286290
} else {
287-
throw new RuntimeException("Cannot reserve more than " + newCapacity +
288-
" bytes in the vectorized reader (requested = " + requiredCapacity + " bytes). As a " +
289-
"workaround, you can disable the vectorized reader by setting "
290-
+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false.");
291+
throwUnsupportedException(newCapacity, requiredCapacity, null);
291292
}
292293
}
293294
}
294295

296+
private void throwUnsupportedException(int newCapacity, int requiredCapacity, Throwable cause) {
297+
String message = "Cannot reserve more than " + newCapacity +
298+
" bytes in the vectorized reader (requested = " + requiredCapacity + " bytes). As a" +
299+
" workaround, you can disable the vectorized reader by setting "
300+
+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false.";
301+
302+
if (cause != null) {
303+
throw new RuntimeException(message, cause);
304+
} else {
305+
throw new RuntimeException(message);
306+
}
307+
}
308+
295309
/**
296310
* Ensures that there is enough storage to store capcity elements. That is, the put() APIs
297311
* must work for all rowIds < capcity.

0 commit comments

Comments
 (0)