[common] Introduce ROW type for ARROW, COMPACTED and INDEXED formats #2079
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #1974
This PR introduces support for nested ROW type in ARROW, COMPACTED, and INDEXED formats, enabling Fluss to handle complex nested data structures including nested rows and nested arrays.
Brief change log
Core Changes:
ArrowRowColumnVectorto support reading ROW type from Arrow formatArrowRowWriterto support writing ROW type to Arrow formatRowColumnVectorinterface for columnar row operationsRowSerializerfor ROW type serialization/deserializationIndexedRowWriterandIndexedRowReaderto support nested ROW typeCompactedRowWriterandCompactedRowReaderto support nested ROW typeType System Enhancements:
InternalRowandInternalArrayinterfaces withgetRow()methodDataGettersto include ROW type getterArrowUtilsto create Arrow column vectors for ROW typeConnector Integration:
FlinkRowConverterandFlinkArrayConverterfor Flink-Fluss type conversionPaimonRowAsFlussRowandPaimonArrayAsFlussArrayfor Paimon-Fluss type conversionTest Coverage:
ArrowReaderWriterTestwith nested ROW and nested ARRAY test casesIndexedRowTestandIndexedRowReaderTestfor ROW type validationThis change enables Fluss to store and process complex nested data structures, which is essential for advanced analytics and complex data modeling scenarios.
Tests
This PR includes the following unit tests to verify the nested ROW and ARRAY type support in Arrow format:
Unit Tests:
ArrowReaderWriterTest#testReaderWriter()- Validates that Arrow reader and writer can correctly handle nested ROW and nested ARRAY typesARRAY(ARRAY(STRING))ROW(INT, ROW(INT, STRING, BIGINT), STRING)IndexedRowReaderTest- Verifies IndexedRow format support for ROW type read/write operationsIndexedRowTest- Validates IndexedRow handling of nested types in various scenariosTest Coverage:
Test Data:
All test cases pass with
mvn clean verify.API and Format
This change affects the storage format:
Documentation
This change introduces a new feature (nested ROW type support). Documentation is not required as per user request.