Skip to content

[Fix][Connector-V2] Fix Paimon nested row write#10888

Open
QuakeWang wants to merge 2 commits into
apache:devfrom
QuakeWang:fix/paimon-nested-row
Open

[Fix][Connector-V2] Fix Paimon nested row write#10888
QuakeWang wants to merge 2 commits into
apache:devfrom
QuakeWang:fix/paimon-nested-row

Conversation

@QuakeWang
Copy link
Copy Markdown

@QuakeWang QuakeWang commented May 15, 2026

Purpose of this pull request

close: #10886.

Fix Paimon sink writing for non-null nested ROW fields. Previously, nested rows were validated against the top-level Paimon table schema, causing field count or schema mismatch errors.

This PR makes RowConverter use the current nested Paimon RowType during recursive conversion, and preserves nested ROW field names during type conversion.

Does this PR introduce any user-facing change?

Yes. Paimon sink can now write nested ROW fields correctly.

No new config option or incompatible behavior is introduced.

How was this patch tested?

Added unit tests for nested ROW success, null nested ROW, field count mismatch, field name/type mismatch, and decimal/timestamp precision.

Check list

Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I went through the latest head and traced the nested-row conversion path into the Paimon sink. I do not see a source-level blocker in the current implementation; the remaining gate is still the red CI.

What this PR solves

  • User pain point
    Nested ROW values can fail to write into the Paimon sink when recursive conversion loses the sink-side nested schema or nested field names.
  • Fix approach
    This PR makes nested-row recursion use the sink-side nested schema explicitly and preserves nested field names when converting SeaTunnel row types back to Paimon row types.
  • One-line summary
    This is a focused fix for the nested-row write path into Paimon, and I do not see a source-level blocker on the latest head.

Runtime path I checked

PaimonSinkWriter.write(...) [PaimonSinkWriter.java:230-233]
  -> RowConverter.reconvert(element, seaTunnelRowType, sinkPaimonTableSchema)
      -> iterate fields [RowConverter.java:456-493]
      -> ROW branch
          -> fetch target RowType from sink schema
          -> recursively reconvert nested SeaTunnelRow using paimonRowType.getFields()
      -> build InternalRow

Key findings

  1. The normal Paimon sink write path definitely hits this logic.
  2. The fix is targeted at the real nested-row boundary, not just a superficial wrapper.
  3. The new tests cover success, field-count mismatch, incompatible nested schema, field-name mismatch, null nested row, and nested precision handling (RowConverterTest.java:372-515, RowTypeConverterTest.java:235-248).
  4. I do not see a new source-level blocker on the latest head.

Test / CI notes

  • The new tests are in-memory schema/row conversion tests and look structurally stable.
  • The latest contributor Build is still red, including a failed unit-test (11, ubuntu-latest) lane and follow-on cancelled/skipped matrix jobs, so CI still needs to be resolved before merge.

Conclusion: can merge after fixes

  1. Blocking items
  • Please get the latest Build green before merge.
  1. Suggested but non-blocking improvements
  • No new source-side changes are required from my side on this revision.

Overall, the nested-row sink-schema propagation looks correct on the latest head, and I do not have a source-level blocker for this revision. The remaining gate is CI.

@davidzollo davidzollo added the First-time contributor First-time contributor label May 16, 2026
@nzw921rx
Copy link
Copy Markdown
Collaborator

@QuakeWang good job, Please rebase the latest dev branch as it has resolved the issue of unit test ci failure

@QuakeWang QuakeWang force-pushed the fix/paimon-nested-row branch from ec3f4a4 to e3bf8ff Compare May 18, 2026 02:59
@QuakeWang
Copy link
Copy Markdown
Author

@QuakeWang good job, Please rebase the latest dev branch as it has resolved the issue of unit test ci failure

@nzw921rx I rebased this PR onto the latest dev and reran CI.

The current failure is in Run / unit-test (11, windows-latest), under seatunnel-dist's ConnectorSpecificationCheckTest. It fails because TypesenseSinkFactory is missing SinkConnectorCommonOptions.MULTI_TABLE_SINK_REPLICA in its optionRule() optional options.

This looks unrelated to the Paimon changes in this PR. It seems to be exposed after the latest dev added connector-typesense into seatunnel-dist.

@nzw921rx
Copy link
Copy Markdown
Collaborator

@QuakeWang good job, Please rebase the latest dev branch as it has resolved the issue of unit test ci failure

@nzw921rx I rebased this PR onto the latest dev and reran CI.

The current failure is in Run / unit-test (11, windows-latest), under seatunnel-dist's ConnectorSpecificationCheckTest. It fails because TypesenseSinkFactory is missing SinkConnectorCommonOptions.MULTI_TABLE_SINK_REPLICA in its optionRule() optional options.

This looks unrelated to the Paimon changes in this PR. It seems to be exposed after the latest dev added connector-typesense into seatunnel-dist.

Thank you, I have fixed the PR I submitted, and once it is merged into dev, it can be rebased

Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the rebase update and for checking the current CI failure path. I re-reviewed the latest head from scratch against upstream/dev and retraced the nested-row conversion path into the Paimon sink again.

What This PR Solves

  • User pain
    Nested ROW values can fail on the normal Paimon sink write path when recursive conversion validates nested data against the top-level schema instead of the sink-side nested row schema.
  • Fix approach
    The current head makes recursive conversion use the current nested Paimon RowType fields directly and preserves nested field names when converting the SeaTunnel row type back to Paimon row type.
  • One-line summary
    I still do not see a source-level blocker in the latest Paimon conversion logic; the remaining red CI is outside this PR's touched path.

Runtime path I checked

Paimon sink write
  -> PaimonSinkWriter.write(...)
      -> RowConverter.reconvert(element, seaTunnelRowType, sinkTableSchema)
          -> iterate fields
          -> nested ROW branch
              -> fetch nested target RowType from sink schema
              -> recursively reconvert nested SeaTunnelRow using paimonRowType.getFields()
          -> build InternalRow for the sink row

Findings

  1. The normal Paimon sink write path definitely hits this logic, so the fix is on the real runtime boundary.
  2. The current implementation correctly switches recursive nested-row validation away from the top-level schema.
  3. The added tests cover the main correctness cases here: nested success, null nested row, field-count mismatch, field-name/type mismatch, and nested decimal/timestamp precision handling.
  4. I do not see a new source-level blocker on the latest head.

Compatibility / Side Effects

  • No new config key or protocol contract is introduced.
  • The change is a focused correctness fix for the existing nested-row write path.
  • I do not see a new concurrency or lifecycle risk in this patch.

Tests / CI

  • The new tests are deterministic in-memory conversion tests and do not show an obvious flaky pattern.
  • The latest GitHub Build is still red, but the visible failure is ConnectorSpecificationCheckTest.testAllConnectorImplementFactoryWithUpToDateMethod, specifically because TypesenseSinkFactory is missing SinkCommonOptions.MULTI_TABLE_SINK_REPLICA in its optionRule(). That failure is outside the touched Paimon files and matches the separate Typesense fix being discussed in #10898.

Conclusion: can merge after fixes

  1. Blocking items
  • Please get the latest Build green before merge. Based on the current logs, that means syncing the Typesense optionRule() fix into the branch once the equivalent change is available on dev.
  1. Suggested but non-blocking follow-up
  • No new source-side changes are required from my side on this revision.

From the Paimon source side, the nested-row fix looks good on the latest head. The remaining blocker is the unrelated CI failure, not the conversion logic in this PR.

QuakeWang added 2 commits May 18, 2026 20:44
Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
@QuakeWang QuakeWang force-pushed the fix/paimon-nested-row branch from e3bf8ff to 610e257 Compare May 18, 2026 12:45
Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this. I re-reviewed the latest head and the current check state before writing this, and this is a source-level rereview of the current revision.

What this PR fixes

  • User pain: nested ROW values in the Paimon sink could be converted with the wrong nested schema context, which breaks the real nested-row write path.
  • Fix approach: the nested ROW branch now looks up the sink-side nested RowType directly, and RowTypeConverter now uses the inner field names instead of reusing the outer field name all the way down.
  • In plain terms: this makes nested row conversion follow the real sink schema instead of the outer row context.

Runtime chain

Paimon sink writer
  -> RowConverter.reconvert(row, sourceType, sinkSchema)
     -> field sqlType == ROW
        -> lookup nested RowType from sink schema
        -> recursively convert nested SeaTunnelRow with nested sink fields
        -> write nested InternalRow with InternalRowSerializer(nestedRowType)

Findings

  • On the current head, the normal write path really does hit this code when a sink row contains a nested ROW field.
  • The fix is precise: it corrects the nested schema source instead of adding a defensive special case.
  • The new unit coverage is good. I checked the success path, field-count mismatch, field-type mismatch, field-name mismatch, null nested row, and nested timestamp/decimal precision cases.
  • From the current source-level pass, I did not find a blocker in the nested-row fix itself.

Merge conclusion: can merge

  1. Blocking items
  • No blocking source-level issue found in the current nested-row fix.
  1. Suggested follow-up
  • The current Build is still red, but the failing job is in seatunnel-engine-client (SeaTunnelClientTest.testGetMultiTableJobMetrics) rather than the touched Paimon path. I refreshed upstream/dev before this pass and I do not see a newer engine-client-side fix to pick up there, so this does not look like a simple “rebase and it goes away” case.

Overall, the nested-row conversion fix itself looks good to me on the latest head. Once the branch-level CI situation is clarified, I do not see a source blocker from this diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Connector-V2][Paimon] Nested ROW field write fails in Paimon sink

4 participants