Skip to content

[Ruby] RecordBatchBuilder doesn't work with list of structs #44918

@fpacanowski

Description

@fpacanowski

Describe the bug, including details regarding any error messages, version, and platform.

RecordBatchBuilder doesn't seem to correctly handle a schema that contains a list of structs. Here's a minimal test case:

schema = Arrow::Schema.new(
  [
   Arrow::Field.new("structs", Arrow::ListDataType.new(
     Arrow::StructDataType.new([
       Arrow::Field.new("foo", :int64),
       Arrow::Field.new("bar", :int64)
     ])
   ))
 ]
)

table = Arrow::RecordBatchBuilder.build(schema, [
  { structs: [] },
  { structs: [] },
]).to_table

assert_equal(2, table.n_rows)

Table should have 2 rows, but it's empty (tested on HEAD).

I've also checked that equivalent code in PyArrow works correctly (the table has two rows):

import pyarrow as pa
import pyarrow.parquet as pq

schema = pa.schema(
    [
        pa.field(
            "structs",
            pa.list_(
                pa.struct([
                    pa.field("foo", pa.int64()),
                    pa.field("bar", pa.int64())
                ])
            )
        )
    ]
)

data = [
    {"structs": []},
    {"structs": []}
]

table = pa.Table.from_pylist(data, schema=schema)
print(table.shape)

pq.write_table(table, "file.parquet")

Related bug report: #44742.

Component(s)

Ruby

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions