Skip to content

Conversation

@XuQianJin-Stars
Copy link
Contributor

Purpose

Linked issue: close #1974

This PR introduces support for nested ROW type in ARROW, COMPACTED, and INDEXED formats, enabling Fluss to handle complex nested data structures including nested rows and nested arrays.

Brief change log

Core Changes:

  • Added ArrowRowColumnVector to support reading ROW type from Arrow format
  • Added ArrowRowWriter to support writing ROW type to Arrow format
  • Introduced RowColumnVector interface for columnar row operations
  • Added RowSerializer for ROW type serialization/deserialization
  • Enhanced IndexedRowWriter and IndexedRowReader to support nested ROW type
  • Enhanced CompactedRowWriter and CompactedRowReader to support nested ROW type

Type System Enhancements:

  • Extended InternalRow and InternalArray interfaces with getRow() method
  • Updated DataGetters to include ROW type getter
  • Enhanced ArrowUtils to create Arrow column vectors for ROW type

Connector Integration:

  • Added FlinkRowConverter and FlinkArrayConverter for Flink-Fluss type conversion
  • Added PaimonRowAsFlussRow and PaimonArrayAsFlussArray for Paimon-Fluss type conversion
  • Updated existing converters to support nested structures

Test Coverage:

  • Enhanced ArrowReaderWriterTest with nested ROW and nested ARRAY test cases
  • Updated IndexedRowTest and IndexedRowReaderTest for ROW type validation

This change enables Fluss to store and process complex nested data structures, which is essential for advanced analytics and complex data modeling scenarios.

Tests

This PR includes the following unit tests to verify the nested ROW and ARRAY type support in Arrow format:

Unit Tests:

  • ArrowReaderWriterTest#testReaderWriter() - Validates that Arrow reader and writer can correctly handle nested ROW and nested ARRAY types

    • Tests nested ARRAY: ARRAY(ARRAY(STRING))
    • Tests nested ROW: ROW(INT, ROW(INT, STRING, BIGINT), STRING)
    • Verifies null value handling in nested structures
    • Validates correct serialization and deserialization of complex nested types
  • IndexedRowReaderTest - Verifies IndexedRow format support for ROW type read/write operations

  • IndexedRowTest - Validates IndexedRow handling of nested types in various scenarios

Test Coverage:

  • Arrow format nested ROW type read/write
  • Arrow format nested ARRAY type read/write
  • Compacted format ROW type support
  • Indexed format ROW type support
  • Type conversion integration with Flink and Paimon connectors

Test Data:

  • Multi-level nested structures with various primitive types
  • Mixed scenarios of null and non-null values
  • Comprehensive validation of all basic types within nested structures

All test cases pass with mvn clean verify.

API and Format

This change affects the storage format:

  • Extends ARROW format to support nested ROW type structures
  • Extends COMPACTED format to support ROW type serialization
  • Extends INDEXED format to support ROW type serialization
  • No breaking changes to existing API or storage format
  • Backward compatible with existing data

Documentation

This change introduces a new feature (nested ROW type support). Documentation is not required as per user request.

@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/issue-1974-support-nestedrow-arrow-format branch 2 times, most recently from 836c3f7 to 0634a97 Compare December 3, 2025 03:37
@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/issue-1974-support-nestedrow-arrow-format branch from 0634a97 to b2d938b Compare December 8, 2025 12:33
@binary-signal
Copy link

binary-signal commented Dec 16, 2025

@XuQianJin-Stars this PR is a godsend. With a few small tweaks, I was able to get nested rows inside array fields working in a Fluss PK table, with tiering to Paimon and union read enabled all running in flink sql client. Is there a way we could merge my changes into your pull request and combine our efforts?

@XuQianJin-Stars
Copy link
Contributor Author

XuQianJin-Stars commented Dec 17, 2025

@XuQianJin-Stars this PR is a godsend. With a few small tweaks, I was able to get nested rows inside array fields working in a Fluss PK table, with tiering to Paimon and union read enabled all running in flink sql client. Is there a way we could merge my changes into your pull request and combine our efforts?

hi @binary-signal well, sure. You can also wait for this PR to be approved and then submit a PR to improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support NestedRow type in log table (Arrow row format)

2 participants