Closed
Description
We're seeing the below stacktrace on Parquet files with the following schema
! java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
! at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:52) ~[parquet-column-1.7.0.jar:1.7.0]
! at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:274) ~[spark-sql_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.sql.execution.vectorized.ColumnVector.getDecimal(ColumnVector.java:588) ~[spark-sql_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) ~[na:na]
! at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) ~[spark-sql_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) ~[spark-sql_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) ~[spark-sql_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) ~[spark-core_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) ~[spark-core_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[spark-core_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) ~[spark-core_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) ~[spark-core_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) ~[spark-core_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.scheduler.Task.run(Task.scala:86) ~[spark-core_2.11-2.0.1.jar:2.0.1]
! at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) ~[spark-core_2.11-2.0.1.jar:2.0.1]
Anonymized schema:
message parquetSchema {
optional int32 COL_1 (INT_32);
optional int32 COL_2 (INT_32);
optional binary COL_3 (DECIMAL(18,2));
optional binary COL_4 (UTF8);
optional binary COL_5 (UTF8);
optional int32 COL_6 (INT_32);
optional int64 COL_7 (TIMESTAMP_MILLIS);
optional int64 COL_8 (TIMESTAMP_MILLIS);
optional int32 COL_9 (INT_32);
optional binary COL_10 (UTF8);
optional int64 COL_11 (TIMESTAMP_MILLIS);
optional int64 COL_12 (TIMESTAMP_MILLIS);
}
COL_3 is the one causing problems.
Metadata
Metadata
Assignees
Labels
No labels