Skip to content

[Java][C] sliced RecordBatch offset info is lost when imported from c-data #88

@hellishfire

Description

@hellishfire

Describe the bug, including details regarding any error messages, version, and platform.

Reproduced on latest arrow release (16.0)

When importing a sliced RecordBatch from c to java

On c side:

auto sliced_record_batch = original_record_batch->Slice(/*offset=*/8, /*length=*/2);
arrow::ExportRecordBatch(sliced_record_batch, arrow_array_ptr);

On java side:

ArrowArray arrowArray = ArrowArray.allocateNew(allocator);
Data.importIntoVectorSchemaRoot(allocator, arrowArray, vectorSchemaRoot, null);

The imported vectorSchemaRoot maintains the correct length(which is 2), but the offset info (which is 8) is not respected, hence the content of the imported vectorSchemaRoot points to the first 2 rows of the original_record_batch, while the desired content is sliced_record_batch.

I'm not familiar with arrow code, but it seems that the offset info is actually present in org.apache.arrow.c.ArrowArray.Snapshot, but org.apache.arrow.c.ArrayImporter ignores the offset in org.apache.arrow.c.ArrayImporter.doImport(ArrowArray.Snapshot)

Component(s)

Java

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions