-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
RecordBatchBuilder doesn't seem to correctly handle a schema that contains a list of structs. Here's a minimal test case:
schema = Arrow::Schema.new(
[
Arrow::Field.new("structs", Arrow::ListDataType.new(
Arrow::StructDataType.new([
Arrow::Field.new("foo", :int64),
Arrow::Field.new("bar", :int64)
])
))
]
)
table = Arrow::RecordBatchBuilder.build(schema, [
{ structs: [] },
{ structs: [] },
]).to_table
assert_equal(2, table.n_rows)Table should have 2 rows, but it's empty (tested on HEAD).
I've also checked that equivalent code in PyArrow works correctly (the table has two rows):
import pyarrow as pa
import pyarrow.parquet as pq
schema = pa.schema(
[
pa.field(
"structs",
pa.list_(
pa.struct([
pa.field("foo", pa.int64()),
pa.field("bar", pa.int64())
])
)
)
]
)
data = [
{"structs": []},
{"structs": []}
]
table = pa.Table.from_pylist(data, schema=schema)
print(table.shape)
pq.write_table(table, "file.parquet")Related bug report: #44742.
Component(s)
Ruby