-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
Let's say you're trying to async read a Parquet file on S3, and that file has metadata (like "created by"). There's an inconsistency:
ParquetRecordBatchStream::schema will produce a Schema object that includes that metadata.
But ParquetRecordBatchStream will yield RecordBatches that have schema objects that don't have the metadata.
The problem is that if you create an ArrowWriter using the first schema and then try to write batches from the stream to it, the schemas won't match (the writer is expecting metadata but each batch has a schema without metadata).
Expected behavior
I'd expect that either:
ParquetRecordBatchStream::schemaproduces aSchemawithout metadata, or- the
RecordBatches produced byParquetRecordBatchStreamhave the exact same schema as what::schemareturns, or ArrowWritershould tolerate its supplied schema differing from the batch schemas provided towrite()in metadata