Skip to content

"HadoopIllegalArgumentException: Invalid buffer, not of length X" when querying Hive erasure coding tables #6413

Closed
@denniean

Description

@denniean

Hello! Our current Presto version is 344 but possibly it should be relevant to the latest version too.
We have Hive tables with HDFS erasure coding enabled. If some datanode with EC blocks is unavailable then executing a query throws

io.prestosql.spi.PrestoException: Error reading from hdfs://path_to_the_file at position 3395
        at io.prestosql.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:91)
        at io.prestosql.orc.AbstractOrcDataSource.readFully(AbstractOrcDataSource.java:108)
        at io.prestosql.orc.AbstractOrcDataSource$DiskOrcDataReader.read(AbstractOrcDataSource.java:323)
        at io.prestosql.orc.stream.AbstractDiskOrcDataReader.seekBuffer(AbstractDiskOrcDataReader.java:91)
        at io.prestosql.orc.stream.CompressedOrcChunkLoader.ensureCompressedBytesAvailable(CompressedOrcChunkLoader.java:165)
        at io.prestosql.orc.stream.CompressedOrcChunkLoader.nextChunk(CompressedOrcChunkLoader.java:115)
        at io.prestosql.orc.stream.OrcInputStream.advance(OrcInputStream.java:204)
        at io.prestosql.orc.stream.OrcInputStream.read(OrcInputStream.java:96)
        at io.prestosql.orc.stream.OrcInputStream.readFully(OrcInputStream.java:121)
        at io.prestosql.orc.stream.ByteArrayInputStream.next(ByteArrayInputStream.java:43)
        at io.prestosql.orc.reader.SliceDirectColumnReader.readBlock(SliceDirectColumnReader.java:198)
        at io.prestosql.orc.reader.SliceColumnReader.readBlock(SliceColumnReader.java:74)
        at io.prestosql.orc.OrcBlockFactory$OrcBlockLoader.load(OrcBlockFactory.java:76)
        ...
Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer, not of length 1045181
        at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:137)
        at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:48)
        at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
        at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
        at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:433)
        at org.apache.hadoop.hdfs.PositionStripeReader.decode(PositionStripeReader.java:74)
        at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:390)
        at org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(DFSStripedInputStream.java:507)
        at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1360)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1324)
        at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
        at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
        at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
        at io.prestosql.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76)
        ... 32 more

Probably the reason is HDFS-14373. Presto Hadoop client depends on 3.2.0 Apache version but the issue has been fixed only in 3.2.2.
Also i've found similar issue #2196
It seems hadoop client should be updated at least to 3.2.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions