Skip to content

Error Reading Decimal Lists: ComplexObjectArrayReader Handles Repetition Levels Incorrectly #2253

@tustvold

Description

@tustvold

Describe the bug

ComplexObjectArrayReader does not use RecordReader and consequently does not correctly delimit semantic records when reading, in particular it may yield values that truncate a row part way through. This will in turn cause the parent ListArrayReader to error out as the repetition levels will not be consistent

To Reproduce

fn test_decimal_list() {
    let decimals = Decimal128Array::from_iter_values([1, 2, 3, 4, 5, 6, 7, 8]);

    // [[], [1], [2, 3], null, [4], null, [6, 7, 8]]
    let data = ArrayDataBuilder::new(ArrowDataType::List(Box::new(Field::new(
        "item",
        decimals.data_type().clone(),
        false,
    ))))
    .len(7)
    .add_buffer(Buffer::from_iter([0_i32, 0, 1, 3, 3, 4, 5, 8]))
    .null_bit_buffer(Some(Buffer::from(&[0b01010111])))
    .child_data(vec![decimals.into_data()])
    .build()
    .unwrap();

    let written = RecordBatch::try_from_iter([(
        "list",
        Arc::new(ListArray::from(data)) as ArrayRef,
    )])
    .unwrap();

    let mut buffer = Vec::with_capacity(1024);
    let mut writer =
        ArrowWriter::try_new(&mut buffer, written.schema(), None).unwrap();
    writer.write(&written).unwrap();
    writer.close().unwrap();

    let read = ParquetFileArrowReader::try_new(Bytes::from(buffer))
        .unwrap()
        .get_record_reader(3)
        .unwrap()
        .collect::<ArrowResult<Vec<_>>>()
        .unwrap();

    assert_eq!(&written.slice(0, 3), &read[0]);
    assert_eq!(&written.slice(3, 3), &read[1]);
    assert_eq!(&written.slice(6, 1), &read[2]);
}

Results in

ParquetError("Parquet error: first repetition level of batch must be 0")

Expected behavior

We should support reading these nested types.

Additional context

#1661 tracks removing this ArrayReader as it is buggy, complex, and not really needed anymore

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions