Closed
Description
Hello! Our current Presto version is 344 but possibly it should be relevant to the latest version too.
We have Hive tables with HDFS erasure coding enabled. If some datanode with EC blocks is unavailable then executing a query throws
io.prestosql.spi.PrestoException: Error reading from hdfs://path_to_the_file at position 3395
at io.prestosql.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:91)
at io.prestosql.orc.AbstractOrcDataSource.readFully(AbstractOrcDataSource.java:108)
at io.prestosql.orc.AbstractOrcDataSource$DiskOrcDataReader.read(AbstractOrcDataSource.java:323)
at io.prestosql.orc.stream.AbstractDiskOrcDataReader.seekBuffer(AbstractDiskOrcDataReader.java:91)
at io.prestosql.orc.stream.CompressedOrcChunkLoader.ensureCompressedBytesAvailable(CompressedOrcChunkLoader.java:165)
at io.prestosql.orc.stream.CompressedOrcChunkLoader.nextChunk(CompressedOrcChunkLoader.java:115)
at io.prestosql.orc.stream.OrcInputStream.advance(OrcInputStream.java:204)
at io.prestosql.orc.stream.OrcInputStream.read(OrcInputStream.java:96)
at io.prestosql.orc.stream.OrcInputStream.readFully(OrcInputStream.java:121)
at io.prestosql.orc.stream.ByteArrayInputStream.next(ByteArrayInputStream.java:43)
at io.prestosql.orc.reader.SliceDirectColumnReader.readBlock(SliceDirectColumnReader.java:198)
at io.prestosql.orc.reader.SliceColumnReader.readBlock(SliceColumnReader.java:74)
at io.prestosql.orc.OrcBlockFactory$OrcBlockLoader.load(OrcBlockFactory.java:76)
...
Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer, not of length 1045181
at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:137)
at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:48)
at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:433)
at org.apache.hadoop.hdfs.PositionStripeReader.decode(PositionStripeReader.java:74)
at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:390)
at org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(DFSStripedInputStream.java:507)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1360)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1324)
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
at io.prestosql.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76)
... 32 more
Probably the reason is HDFS-14373. Presto Hadoop client depends on 3.2.0 Apache version but the issue has been fixed only in 3.2.2.
Also i've found similar issue #2196
It seems hadoop client should be updated at least to 3.2.2