Skip to content

[Improve][connector-jdbc] Remove useless parameters: support_upsert_by_insert_only#10897

Open
chl-wxp wants to merge 1 commit into
apache:devfrom
chl-wxp:delete-SUPPORT_UPSERT_BY_INSERT_ONLY
Open

[Improve][connector-jdbc] Remove useless parameters: support_upsert_by_insert_only#10897
chl-wxp wants to merge 1 commit into
apache:devfrom
chl-wxp:delete-SUPPORT_UPSERT_BY_INSERT_ONLY

Conversation

@chl-wxp
Copy link
Copy Markdown
Member

@chl-wxp chl-wxp commented May 18, 2026

Purpose of this pull request

Remove the useless parameter support_upsert_by_insert_only from JDBC sink connector.

Reasons for removal:

  1. No actual functionality: This parameter does not implement the "insert-only" semantics (such as INSERT IGNORE or ON CONFLICT DO NOTHING). It simply changes the upsert executor to a plain INSERT executor, which causes
    primary key conflicts when the record already exists.
  2. Redundant complexity:
    - When primary_keys is not configured: The system uses simple INSERT directly; this parameter is not needed.
    - When primary_keys is configured: The upsert mode handles CDC operations (INSERT/UPDATE/DELETE). Setting support_upsert_by_insert_only=true causes UPDATE_AFTER operations to execute INSERT, leading to primary key violation errors.
  3. No documentation: The parameter is not documented in JdbcSink docs (only referenced in RedshiftSink as an inherited option without explanation).

The correct way to achieve "insert-only without update" behavior is to disable upsert mode, not to use this parameter.

@chl-wxp chl-wxp marked this pull request as ready for review May 18, 2026 03:52
@nzw921rx
Copy link
Copy Markdown
Collaborator

@chl-wxp Hello, may I ask why this parameter is being cleared? It seems to be an entry that only wants to insert and not update?

@chl-wxp
Copy link
Copy Markdown
Member Author

chl-wxp commented May 18, 2026

@chl-wxp Hello, may I ask why this parameter is being cleared? It seems to be an entry that only wants to insert and not update?

Thank you for your review. When I reviewed the code, I found that the final behavior controlled by the parameter support_upsert_by_insert_only is to insert data using insert syntax. When we want to achieve this goal, we only need to set the parameter enable_upsert to false. Another very important point is that there is no introduction to the support_upsert_by_insert_only parameter in earlier versions.

@nzw921rx
Copy link
Copy Markdown
Collaborator

@chl-wxp Hello, may I ask why this parameter is being cleared? It seems to be an entry that only wants to insert and not update?

Thank you for your review. When I reviewed the code, I found that the final behavior controlled by the parameter support_upsert_by_insert_only is to insert data using insert syntax. When we want to achieve this goal, we only need to set the parameter enable_upsert to false. Another very important point is that there is no introduction to the support_upsert_by_insert_only parameter in earlier versions.

@chl-wxp Thank you for your clarification. I still have the following concerns

  • When enable_upsert = false, the semantics and behavior of createInsertOrUpdateExecutor differ from those of createInsertOnlyExecutor.

  • There has been a continued compatibility impact for users of this parameter. Since the support_upsert-by_insert_only parameter was not available in early versions, we cannot rule out the possibility that some users depended on it. This may lead to unexpected behavior for these users after upgrading.

private static JdbcBatchStatementExecutor<SeaTunnelRow> createUpsertExecutor(
            JdbcDialect dialect,
            String database,
            String table,
            TableSchema tableSchema,
            TableSchema databaseTableSchema,
            String[] pkNames,
            TableSchema pkTableSchema,
            Function<SeaTunnelRow, SeaTunnelRow> keyExtractor,
            boolean enableUpsert,
            boolean isPrimaryKeyUpdated,
            boolean supportUpsertByInsertOnly) {
        if (supportUpsertByInsertOnly) {
            return createInsertOnlyExecutor(
                    dialect, database, table, tableSchema, databaseTableSchema);
        }
        if (enableUpsert) {
            Optional<String> upsertSQL =
                    dialect.getUpsertStatementByTableSchema(database, table, tableSchema, pkNames);
            if (upsertSQL.isPresent()) {
                return createSimpleExecutor(
                        upsertSQL.get(),
                        tableSchema,
                        databaseTableSchema,
                        dialect.getRowConverter());
            }
            return createInsertOrUpdateByQueryExecutor(
                    dialect,
                    database,
                    table,
                    tableSchema,
                    databaseTableSchema,
                    pkNames,
                    pkTableSchema,
                    keyExtractor,
                    isPrimaryKeyUpdated);
        }
        return createInsertOrUpdateExecutor(
                dialect,
                database,
                table,
                tableSchema,
                databaseTableSchema,
                pkNames,
                isPrimaryKeyUpdated);
    }

Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this cleanup. I reviewed the full current diff against upstream/dev and traced the JDBC sink execution path instead of only reading the option deletion.

What This PR Changes

  • User pain
    The current support_upsert_by_insert_only option is misleading: it does not implement a safe "insert-only upsert" contract. It switches the executor to plain INSERT, so duplicate primary keys still fail at runtime.
  • Change approach
    This PR removes the option from JdbcSinkOptions, JdbcSinkConfig, JdbcSinkFactory, and JdbcOutputFormatBuilder, and also removes the Redshift docs reference.
  • One-line summary
    The cleanup direction is understandable, but removing an existing sink config key is still a compatibility change and the current PR does not preserve that contract safely.

Runtime path I checked

JDBC sink config
  -> JdbcSinkFactory.optionRule()
      -> currently exposes support_upsert_by_insert_only
  -> JdbcSinkConfig.of(...)
      -> reads support_upsert_by_insert_only into JdbcSinkConfig

Write path
  -> JdbcOutputFormatBuilder.createBatchStatementExecutor(...)
      -> if supportUpsertByInsertOnly == true
          -> createInsertOnlyExecutor(...)
              -> always use plain INSERT
      -> otherwise
          -> normal upsert / insert-update / delete executors

Findings

Issue 1: this PR removes a user-facing JDBC sink option without a compatibility bridge or migration contract

  • Location: seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/config/JdbcSinkOptions.java:109-114, .../JdbcSinkConfig.java:41-55, .../JdbcSinkFactory.java:224-231, .../JdbcOutputFormatBuilder.java:175-248
  • Why this is a blocker:
    Even if the option is semantically weak, it is still part of the current public config surface. Existing jobs that set support_upsert_by_insert_only will stop being valid once this PR lands, because the key is removed from option exposure and from the runtime path entirely.
  • Risk:
    This is a backward-compatibility break for existing JDBC sink configs. It is especially risky because the PR does not add docs/en/introduction/concepts/incompatible-changes.md migration guidance, and it does not keep a compatibility parse path such as "accept but warn / ignore".
  • Best fix:
    • Option A, preferred: keep the option parsable for one transition period, mark it deprecated, and either warn loudly or treat it as a no-op with explicit migration guidance.
    • Option B: if the community wants immediate removal, then this must be handled as an explicit incompatible change with migration documentation telling users what config to use instead.
  • Severity: High

Compatibility / Side Effects

  • This PR is not fully compatible in its current form, because it removes an existing user-facing config key.
  • No API class signature is broken, but the sink configuration contract changes.
  • I do not see a new CPU or concurrency risk from the code deletion itself; the blocker is the config-compatibility boundary.

Tests / CI

  • I did not do local build execution in this review batch; this is a source-level review only.
  • The latest GitHub Build is also red, but the visible failure is CoordinatorServiceTest.testGetPendingJobInfo:956 expected: <true> but was: <false> in seatunnel-engine-server, which is outside the touched JDBC files. From current evidence that CI failure does not look caused by this JDBC cleanup.

Conclusion: can merge after fixes

  1. Blocking items
  • Issue 1: preserve the existing config contract with a deprecation bridge, or document the removal as an explicit incompatible change with migration guidance.
  1. Suggested but non-blocking follow-up
  • After the compatibility handling is settled, sync with the latest dev again if the unrelated engine CI failure is still present.

The main concern here is not that the current option is perfect. The concern is that SeaTunnel should not silently remove an existing sink config surface without a transition contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants