Skip to content

[c++] arrow::int32 throws exc_bad_access  #35616

@hqx871

Description

@hqx871

hi team! I use the 0.15.1 and found a problem when read parquet file, which contains array column. The asan output

parquet-low-level-example(49396,0x7ff848622680) malloc: nano zone abandoned due to inability to preallocate reserved vm space.
row num:1000000
=================================================================
==49396==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0001087d73f8 at pc 0x0001076ecb8d bp 0x7ff7b8b4d6b0 sp 0x7ff7b8b4d6a8
WRITE of size 8 at 0x0001087d73f8 thread T0
    #0 0x1076ecb8c in int arrow::util::RleDecoder::GetBatchWithDictSpaced<long long>(long long const*, long long*, int, int, unsigned char const*, long long) rle_encoding.h:488
    #1 0x1076e62c8 in parquet::DictDecoderImpl<parquet::PhysicalType<(parquet::Type::type)2> >::DecodeSpaced(long long*, int, int, unsigned char const*, long long) encoding.cc:1079
    #2 0x1075d9e6b in parquet::internal::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2> >::ReadValuesSpaced(long long, long long) column_reader.cc:1052
    #3 0x1075dc1a9 in parquet::internal::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2> >::ReadRecordData(long long) column_reader.cc:1096
    #4 0x1075d6a4c in parquet::internal::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2> >::ReadRecords(long long) column_reader.cc:822
    #5 0x1073d1583 in parquet::arrow::LeafReader::NextBatch(long long, std::__1::shared_ptr<arrow::ChunkedArray>*) reader.cc:414
    #6 0x1073d55bd in parquet::arrow::NestedListReader::NextBatch(long long, std::__1::shared_ptr<arrow::ChunkedArray>*) reader.cc:469
    #7 0x1073f5a82 in parquet::arrow::RowGroupRecordBatchReader::ReadNext(std::__1::shared_ptr<arrow::RecordBatch>*) reader.cc:320
    #8 0x1073b409a in printParquetFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) reader-writer.cc:97
    #9 0x1073b5209 in main reader-writer.cc:111
    #10 0x7ff8049b230f  (<unknown module>)

0x0001087d73f8 is located 40 bytes to the left of global variable 'guard variable for arrow::SparseTensor::dim_name(int) const::kEmpty' defined in 'arrow-apache-arrow-0.15.1/cpp/src/arrow/sparse_tensor.cc' (0x1087d7420) of size 8
0x0001087d73f8 is located 0 bytes to the right of global variable 'kEmpty' defined in 'arrow-apache-arrow-0.15.1/cpp/src/arrow/sparse_tensor.cc:415:28' (0x1087d73e0) of size 24
SUMMARY: AddressSanitizer: global-buffer-overflow rle_encoding.h:488 in int arrow::util::RleDecoder::GetBatchWithDictSpaced<long long>(long long const*, long long*, int, int, unsigned char const*, long long)
Shadow bytes around the buggy address:
  0x1000210fae20: 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9
  0x1000210fae30: 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 00 f9 f9 f9
  0x1000210fae40: 00 00 f9 f9 00 f9 f9 f9 01 f9 f9 f9 01 f9 f9 f9
  0x1000210fae50: 01 f9 f9 f9 01 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9
  0x1000210fae60: 01 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 00 00
=>0x1000210fae70: 00 00 00 f9 f9 f9 f9 f9 00 00 00 00 00 00 00[f9]
  0x1000210fae80: f9 f9 f9 f9 00 f9 f9 f9 00 00 00 f9 f9 f9 f9 f9
  0x1000210fae90: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
  0x1000210faea0: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
  0x1000210faeb0: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
  0x1000210faec0: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==49396==ABORTING

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions