Skip to content

[Variant] writing a VariantArray to parquet panics #8296

@alamb

Description

@alamb

Describe the bug
As part of testing integration with the parquet crate in #8133 I found that trying to write a VariantArray directly to parquet panics

To Reproduce

 // Use the VariantArrayBuilder to build a VariantArray
 let mut builder = VariantArrayBuilder::new(3);
 // row 1: {"name": "Alice"}
 let mut variant_builder = builder.variant_builder();
 variant_builder.new_object().with_field("name", "Alice").finish()?;
 variant_builder.finish();
 let array = builder.build();

// TODO support writing VariantArray directly
// at the moment it panics when trying to downcast to a struct array
let array: ArrayRef = Arc::new(array);

 // create a RecordBatch with the VariantArray
 let batch = RecordBatch::try_from_iter(vec![("data", array)])?;

 // write the RecordBatch to a Parquet file
 let file = std::fs::File::create("variant.parquet")?;
 let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?;
 writer.write(&batch)?;
 writer.close()?;

This results in this panic

struct array
thread 'main' panicked at arrow-array/src/cast.rs:904:30:
struct array
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:697:5
   1: core::panicking::panic_fmt
             at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/panicking.rs:75:14
   2: core::panicking::panic_display
             at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/panicking.rs:268:5
   3: core::option::expect_failed
             at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/option.rs:2081:5
   4: core::option::Option<T>::expect
             at /Users/andrewlamb/.rustup/toolchains/1.89-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/option.rs:960:21
   5: arrow_array::cast::AsArray::as_struct
             at /Users/andrewlamb/Software/arrow-rs/arrow-array/src/cast.rs:904:30
   6: parquet::arrow::arrow_writer::levels::LevelInfoBuilder::try_new
             at ./src/arrow/arrow_writer/levels.rs:162:35
   7: parquet::arrow::arrow_writer::levels::calculate_array_levels
             at ./src/arrow/arrow_writer/levels.rs:55:23
   8: parquet::arrow::arrow_writer::compute_leaves
             at ./src/arrow/arrow_writer/mod.rs:625:18
   9: parquet::arrow::arrow_writer::ArrowRowGroupWriter::write
             at ./src/arrow/arrow_writer/mod.rs:839:25
  10: parquet::arrow::arrow_writer::ArrowWriter<W>::write

Expected behavior
We should be able to write a VariantArray directly without such an error

Additional context

Metadata

Metadata

Assignees

Labels

bugparquetChanges to the parquet crateparquet-variantparquet-variant* crates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions