Skip to content

Conversation

@beryllw
Copy link
Contributor

@beryllw beryllw commented Nov 13, 2024

Support Elasticsearch sink to write into different indexes based on a specific field.

@github-actions github-actions bot added docs Improvements or additions to documentation elasticsearch-pipeline-connector labels Nov 13, 2024
@lvyanquan
Copy link
Contributor

@proletarians can you help to review this?

@proletarians
Copy link
Contributor

OK,I will review it

@beryllw
Copy link
Contributor Author

beryllw commented Nov 15, 2024

If this PR passes, I will rebase after #3728.

@leonardBang
Copy link
Contributor

If this PR passes, I will rebase after #3728.

@beryllw Would you like to update this PR?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 14, 2025

@beryllw Would you like to update this PR?

Sure.Thanks for help.

@beryllw
Copy link
Contributor Author

beryllw commented Jan 17, 2025

@lvyanquan @leonardBang PTAL

<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>Long</td>
<td>Sharding suffix key for each table, allow setting sharding suffix key for multiTables.Default sink table name is test_table${suffix_key}.Tables are separated by ';'.For example, we can set sharding.suffix.key by 'table1:col1;table2:col2'.</td>
Copy link
Contributor

@lvyanquan lvyanquan Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @beryllw, could you explain more about target scenario of this issue? I don't understand the meaning of this format for table name.

Copy link
Contributor Author

@beryllw beryllw Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For big MySQL tables, writing to an ES index will cause a single ES index to be too large, and ES rolling indexes do not support updates.Implement dt partitioning through ES index suffix name

@beryllw beryllw requested a review from lvyanquan January 20, 2025 08:57
<td style="word-wrap: break-word;">(none)</td>
<td>Long</td>
<td>Sharding suffix key for each table, allow setting sharding suffix key for multiTables.Default sink table name is test_table${suffix_key}.Tables are separated by ';'.For example, we can set sharding.suffix.key by 'table1:col1;table2:col2'.</td>
<td>Sharding suffix key for each table, allow setting sharding suffix key for multiTables.Default sink table name is test_table${suffix_key}.Default sharding column is first partition column.Tables are separated by ';'.Table and column are separated by ':'.For example, we can set sharding.suffix.key by 'table1:col1;table2:col2'.</td>
Copy link
Contributor

@lvyanquan lvyanquan Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making the separator configurable with a default value of $

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be '_' is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

@beryllw beryllw requested a review from lvyanquan January 23, 2025 02:56
Copy link
Contributor

@lvyanquan lvyanquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@beryllw
Copy link
Contributor Author

beryllw commented Jan 23, 2025

LGTM.

Thanks for your review.

@beryllw
Copy link
Contributor Author

beryllw commented Feb 7, 2025

@leonardBang PTAL

Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @beryllw for the contribution and @lvyanquan for the review, LGTM

@leonardBang
Copy link
Contributor

CI passed, merging...

@leonardBang leonardBang merged commit c284318 into apache:master Feb 7, 2025
29 checks passed
@beryllw beryllw deleted the es_index_sharding branch February 7, 2025 03:57
@beryllw
Copy link
Contributor Author

beryllw commented Feb 7, 2025

CI passed, merging...

Thanks for help.

lvyanquan pushed a commit to lvyanquan/flink-cdc that referenced this pull request Mar 7, 2025
…e-11

Link: https://code.alibaba-inc.com/ververica/flink-cdc/codereview/20462875
* [FLINK-35173][cdc][mysql] Debezium custom time serializer for MySQL connector (apache#3240)

* [minor][cdc][docs] Add user guide about providing extra jar package in quickstart docs

* [FLINK-35235][pipeline-connector][kafka] Fix missing dependencies in the uber jar of Kafka pipeline sink

* [FLINK-35251][cdc][runtime] Fix bug of serializing derivation mapping in SchemaDerivation

This closes  apache#3267.

* [FLINK-35258][cdc][doc] Fix broken links to Doris in documentation (apache#3276)

* [FLINK-35256][cdc][runtime] Fix transform node does not respect type nullability (apache#3272)

* [minor][docs] Fix route definition example in core concept docs (apache#3269)

* [FLINK-35259][cdc][transform] Fix FlinkCDC pipeline transform can't deal timestamp field (apache#3278)

* [FLINK-34878][cdc][transform] Flink CDC pipeline transform supports CASE WHEN (apache#3228)

* [FLINK-35255][cdc][runtime] DataSinkWriterOperator adds overrides for the snapshotState and processWatermark methods (apache#3271)

* [FLINK-35264][cdc][runtime] Fix multiple transform rules do not take effect (apache#3280)

* [FLINK-35244][cdc-connector][tidb] Correct the package for flink-connector-tidb-cdc test

* [FLINK-35274][cdc-connector][db2] Fix occasional failure issue with Flink CDC Db2 UT

* [hotfix][docs] Correct example configuration for Paimon warehouse path

* [FLINK-35245][cdc-connector][tidb] Add metrics for flink-connector-tidb-cdc

* [minor][docs] Rectify names of CDC sources for Flink and improve the content directory

This closes apache#3310.

* [hotfix][docs] Fix dead links in documentations

This closes  apache#3314.

* [FLINK-35314][cdc] Add Flink CDC pipeline transform user document (apache#3308)

* [FLINK-35386][cdc][docs] Build release-3.1 documentation and mark it as stable (apache#3330)

* [FLINK-35386][cdc][docs] Shorten GHA names and disable fail-fast to isolate branches (apache#3331)

* [minor][docs] Improve the answer of MySQL CDC FAQ docs

This closes  apache#3337.

* [FLINK-35431][doc] Migrate references in Flink CDC documentation from Debezium 1.9 to 2.0 to avoid dead links

This closes apache#3351.

* [FLINK-35298][cdc][metrics] Improve the fetch delay metrics

This closes apache#3298.

* [FLINK-35294][mysql] Use source config to check if the filter should be applied in timestamp starting mode

This closes apache#3291.

* [FLINK-35295][mysql] Improve jdbc connection pool initialization failure message

This closes apache#3293.

* [FLINK-35301][cdc] Avoid deadlock when loading driver classes

This closes  apache#3300.

* [FLINK-35408][mysql] Introduce 30 minutes tolerance when validate the time-zone setting

This closes  apache#3341.

* [FLINK-35447][doc-ci] Flink CDC Document document file had removed but website can access

This closes  apache#3362.

* [minor][cdc-cli] Suppress false alarm in flink config loader

This closes  apache#3292.

* [FLINK-35409][cdc][mysql] Request more splits if all splits are filtered from addSplits method 

This closes apache#3342.

* [FLINK-35323][cdc-runtime] Fix transform failure when one rule matches multiple tables with incompatible schema

This closes  apache#3313.

* [FLINK-35430][cdc-connector][kafka] Pass the time zone infor to JsonSerializationSchema

This closes  apache#3359.

* [FLINK-35149][cdc-composer] Fix DataSinkTranslator#sinkTo ignoring pre-write topology if not TwoPhaseCommittingSink (apache#3233)

* [FLINK-35129][postgres] Introduce scan.lsn-commit.checkpoints-num-delay option to control LSN offset commit lazily

 This close apache#3349.

* [FLINK-35325][cdc-connector][paimon] Support specify column order when adding new columns to a table

This closes apache#3323.

* [build][hotfix] Fix jackson conflicts among cdc connectors

This closes  apache#2987.

* [minor][docs] Improve the readme and issue template

This closes  apache#3383.

* [FLINK-35527][docs] Polish quickstart guide & migrate maven links from ververica to apache

* [minor][cdc-connector][paimon] Remove duplicate interface implements

* [FLINK-35464][cdc] Fixes operator state backwards compatibility from CDC 3.0.x and add migration tests (apache#3369)

* [FLINK-35415][base] Fix compatibility with Flink 1.19

* [FLINK-35415][base] Bump Flink patch version to 1.18.1

* [FLINK-35316][base] Run E2e test cases with Flink 1.19 and latest patch versions

* [FLINK-35120][doris] Add Doris integration test cases

* [FLINK-35092][cdc][starrocks] Add starrocks integration test cases

* [FLINK-34908][pipeline-connector][doris] Fix MySQL to doris pipeline will lose precision for timestamp type

This closes  apache#3207.

* [hotfix][cdc-composor] Adjust test of cdc composer from junit4 to junit5

* [hotfix][ci] PIPELINE_CONNECTORS stage should include pipeline connector paimon

 This closes apache#3344.

* [hotfix][build] Remove test-jar from parent pom and only add to necessary modules

This closes  apache#3397.

Co-authored-by: Qingsheng Ren <renqschn@gmail.com>

* [FLINK-35545][doc] Miss 3.1.0 version in snapshot flink-cdc doc version list

* [FLINK-35545][doc] Revert dbz doc 2.0 back to 1.9

* [FLINK-35540][cdc-common] Fix table missed when database and table are with the same name

This closes  apache#3396.

* [FLINK-35121][common] Adds validation for pipeline definition options

* [FLINK-35297][mysql] Add validation for option connect.timeout because of HikariConfig limitation (apache#3295)

* [docs][minor] Optimize styles of the Flink CDC index page

This closes  apache#3420

* [hotfix][ci] Add new pipeline connectors into labeler.yml

* [FLINK-35648][runtime] Allow applying multiple route rules for one single source table (apache#3425)

* [FLINK-35700][cli] Loosen CDC pipeline options validation

This closes  apache#3435.

* [FLINK-35281][cdc-common] FlinkEnvironmentUtils#addJar add each jar only once (apache#3301)

* [FLINK-35354] Support host mapping in Flink tikv cdc (apache#3336)

* [FLINK-35337][cdc] Keep up with the latest version of tikv client

* [FLINK-35354]Support host mapping in Flink tikv cdc

* [FLINK-35354] Add doc for host mapping feature

* [FLINK-35354] fixed annotation import

* [FLINK-35647][route] Support symbol replacement to enrich routing rules

This closes apache#3428.

Co-authored-by: 张田 <zhang_tian@inspur.com>
Co-authored-by: yangshuaitong <duguhoney@gmail.com>

* [FLINK-35781][cli] Provide a default parallelism 1 for pipeline jobs 

This closes apache#3458.

* [minor][cdc][docs] Optimize markdown formats in doris quickstart doc

This closes apache#3324.

* [FLINK-34990][cdc-connector][oracle] Oracle cdc support newly add table (apache#3203)

* [cdc-connector][oracle] Oracle cdc support newly add table

* [cdc-connector][oracle] Fix code style

* [cdc-connector][oracle] Address comment

* [hotfix][docs][postgres] Remove unsupported erroneous example code

This closes apache#3464

* [build][CI] Introduce automated PR governance workflows

This closes  apache#3466

* [FLINK-34883] Fix postgres uuid column as PK (apache#3282)

* [FLINK-34883] Fix postgres uuid column as PK

* [FLINK-34883] Fix column comment.

* [minor][cdc-connector][postgres] PostgresDialect removes useless code

This closes apache#3416.

* [FLINK-35758][doc][cdc-connector][mysql] Add missed scan.startup.timestamp-millis option for mysql connector

This closes  apache#3453.

* [minor][cdc-connector][mysql] Code optimization for constants 

This closes apache#3385

* [minor][cdc-connector][sqlserver] Fix some typos (apache#3421)

* [docs][minor] Specify the lib directory of Flink CDC in quick start docs

This closes apache#3479.

Co-authored-by: fangxiangmin <fangxiangmin>
Co-authored-by: Leonard Xu <leonard@apache.org>
Co-authored-by: yux <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-35237][cdc-common] Allow Sink to choose HashFunction in PrePartitionOperator

* [FLINK-35237][cdc-common] Improve the interfaces and reorganize the directory.

* [FLINK-35871][doc] Add missed "snapshot" mode to mysql connector startup options

This closes apache#3484.

* [FLINK-35865][base] Support Byte and Short in ObjectUtils (apache#3481)

* [minor][doc][cdc-connector][oracle] Update OracleSchema#getTableSchema doc description

This closes apache#3443.

* [FLINK-35874][cdc-connector][mysql] Check pureBinlogPhaseTables set before call getBinlogPosition method

This closes apache#3488.

* [FLINK-35740][cdc-connector][mysql] Allow column as chunk key even it's not primary key

This closes apache#3448.

* [minor][test] Add Flink CDC 3.1.1 version to migration test version list

This closes apache#3426.

* [FLINK-35868][cdc-connector][mongodb] Bump dependency version to support MongoDB 7.0 

This closes apache#3489.

* [FLINK-35391][cdc-connector][paimon] Bump dependency of Paimon Pipeline connector to 0.8.2

This closes apache#3335

Co-authored-by: wangjunbo <wangjunbo@qiyi.com>

* [FLINK-35736][test] Add migration test scripts & CI workflows

This closes apache#3447

* [docs][minor] Add transform piece for pipeline example

This closes  apache#3496

* [FLINK-35072][doris] Support applying AlterColumnTypeEvent to Doris pipeline sink

This closes  apache#3473

* [FLINK-34877][cdc] Support type cast conversion in pipeline transform 

This closes apache#3357.

* [hotfix][ci] Migrate to Docker Compose V2 (apache#3505)

* [FLINK-35344][cdc-base] Move same code from multiple subclasses to JdbcSourceChunkSplitter (apache#3319)

* [FLINK-35524][cdc-base] Clear connections pools when reader exist. (apache#3388)

* [FLINK-35674][cdc-connector][mysql]Fix blocking caused by searching for timestamp in binlog file (apache#3432)

* [FLINK-35912][cdc-connector] SqlServer CDC doesn't chunk UUID-typed columns correctly (apache#3497)

* resolve conficts

* polish code to trigger ci

---------

Co-authored-by: Kael <kael@fts.dev>
Co-authored-by: gongzhongqiang <gongzhongqiang@gigacloudtech.com>

* [hotfix][starrocks] Fix StarRocks FE startup failure due to insufficient disk space available 

This closes apache#3508.

* [FLINK-35813][cdc-runtime] Do not clear state field in TransformSchemaOperator  until operator closed 

This closes apache#3469.

* [FLINK-35234][minor][cdc-common] Fix potential NPE in ConfigurationUtils

This closes apache#3255.

* [FLINK-35743][cdc-runtime] Correct the temporal function semantics

This closes apache#3449.

Co-authored-by: wenmo <32723967+wenmo@users.noreply.github.com>
Co-authored-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* Revert "[hotfix][starrocks] Fix StarRocks FE startup failure due to insufficient disk space available "

This reverts commit 3315be3.

* [hotfix][ci] Clean up disk space

* [FLINK-34638][cdc-common] Support column with default value

* [FLINK-35893][cdc-runtime] Write CURRENT_VERSION of TableChangeInfo to state

This closes apache#2944.

* [hotfix][e2e] Add missing default values field in E2e test

This closes apache#3516.

* [FLINK-35242][cdc-common][cdc-runtime] Support TRY_EVOLVE and LENIENT schema evolution behavior

This closes apache#3339.

* [FLINK-35432][pipeline-connector][mysql] Support catch modify event in mysql to send AlterColumnTypeEvent. (apache#3352)


Co-authored-by: haoke <haoke@bytedance.com>

* [FLINK-35791][kafka] Add database and table info of Canal / Debezium json format for Kafka sink (apache#3461)

* [build][e2e] Separate Pipeline and Source E2e tests and cover flink 1.20 version

This closes apache#3514.

* [FLINK-35730][cdc-cli] PipelineDefinitionParser supports  parsing pipeline def in text format

This closes apache#3444.

* [FLINK-35272][cdc-runtime] Transform supports omitting  and renaming computed column

This closes apache#3285.

* [FLINK-34853] Submit CDC Job To Flink K8S Native Application Mode (apache#3093)

* [hotfix][cdc-connector][mongodb] Fix unstable test cases for snapshot back-filling (apache#3506)

* [FLINK-35143][pipeline-connector][mysql] Expose newly added tables capture in mysql pipeline connector. (apache#3411)


Co-authored-by: Muhammet Orazov <916295+morazow@users.noreply.github.com>
Co-authored-by: north.lin <north.lin@yunlsp.com>

* [FLINK-35715][common] Ignore the compact optimize for mysql timestamp type in BinaryRecordData (apache#3511)

* [FLINK-34688][cdc-connector][mysql] Make scan newly table trigger condition strictly

This closes apache#3519.

* [FLINK-35442][cdc-connect][kafka] add key.format and partition.strategy option to make sure the same record sending to the same partition. (apache#3522)

* [FLINK-34876][transform] Support UDF functions in transform (apache#3465)

* [FLINK-36007][[cdc-composer]  Loading factory and added jar in one search

This close apache#3520.

* [FLINK-35981][cdc-runtime] Transform supports referencing one column more than once

This closes  apache#3515.

* [hotfix][doc] Fix doris document dead links and typo

This closes apache#3527

* [FLINK-35938][pipeline-connector][paimon] Use filterAndCommit API for retry Committable to avoid duplicate commit 

This closes apache#3504.

* [FLINK-36023][cdc-composer] Flink CDC K8S Native Application Mode add wrong jar url (apache#3523)

* [FLINK-35984][cdc-runtime] Fix bug that metadata column name can not be used in transform rule

This closes  apache#3528.

Co-authored-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-35891][pipeline-connector][paimon] Support dynamic bucket in Paimon sink

This closes  apache#3499.

* [FLINK-35884][pipeline-connector][mysql] MySQL pipeline connector supports to set chunk key-column (apache#3490)


Co-authored-by: wangjunbo <wangjunbo@qiyi.com>
Co-authored-by: Hang Ruan <ruanhang1993@hotmail.com>

* [FLINK-35894][pipeline-connector][es] Introduce Elasticsearch Sink Connector for Flink CDC Pipeline

* [FLINK-35894][pipeline-connector][es] Support for ElasticSearch 6, 7 versions

This closes apache#3495.

* [FLINK-35991][cdc-runtime] Resolve operator conflicts in transform SQL operator tables

* [FLINK-36034][cdc-runtime] Get rid of Flink table planner dependency in cdc runtime module

This closes apache#3513.

* [release] Update version to 3.3-SNAPSHOT 

This closes apache#3531.

Co-authored-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [hotfix][cdc-runtime] Invalidate cache correctly to avoid classloader leakage 

This closes apache#3533

* [FLINK-35805][transform] Add __data_event_type__ metadata column

This closes  apache#3468

* [FLINK-36054][cdc][build] Fix Flink CDC parent pom and release scripts (apache#3540)

* [FLINK-36056][cdc][connector/elasticsearch] Change flink.connector.elasticsearch.version to a released version

This closes  apache#3542.

* [FLINK-35638][OceanBase][test] Refactor OceanBase test cases and remove dependency on host network (apache#3439)

* [minor][docs] Compress images to reduce file size and improve website load speed

This closes  apache#3551

* [FLINK-36082][pipeline-connector][kafka] Fix lamda NotSerializableException in KafkaDataSink

This closes apache#3549

* [FLINK-36088][pipeline-connector][paimon] Fix NPE in BucketAssignOperator when job restoration

This closes apache#3553

* [FLINK-36076][minor][cdc-runtime] Set isSchemaChangeApplying as volatile for thread safe consideration

This closes apache#3556.

* [FLINK-35243][cdc-common] Extends more schema change event types support

This close  apache#3521.

* [FLINK-36111][minor][pipeline-connector/paimon] Improve MultiTableCommittableChannelComputer Topology name

This closes apache#3559

* [FLINK-36081][cdc-connector][mysql] Remove the schemas of outdated tables in the BinlogSplit when restart (apache#3548)


Co-authored-by: 云时 <mingya.wmy@alibaba-inc.com>

* [FLINK-36094][cdc-runtime] Improve the Exception that SchemaRegistryRequestHandler thrown

 This closes apache#3558.

* [FLINK-36115][pipeline-connector][mysql]  Introduce scan.newly-added-table.enabled option for MySQL Source

This closes  apache#3560.

* [FLINK-36114][cdc-runtime] Make SchemaRegistryRequestHandler thread safe by blocking  subsequent schemaChangeEvent

This closes apache#3563.

Co-authored-by: Hongshun Wang <loserwang1024@gmail.com>

* [FLINK-36092][cdc-runtime] Fix schema evolution failure with wildcarded transform rule

This closes apache#3557.

* [FLINK-35056][cdc-connector/sqlserver] Fix scale mapping from SQL Server TIMESTAMP to Flink SQL timestamp

This closes apache#3561.

* [hotfix][cdc-runtime] Fix schema registry hanging in multiple parallelism

This closes  apache#3567.

* [hotfix][cdc-connector][mongodb] Fix LegacyMongoDBSourceExampleTest cannot run (apache#3555)

* [FLINK-36128][cdc-runtime] Fix potential unrecoverable in-flight data exception by promoting LENIENT as the default schema change behavior

This closes apache#3574.

* [hotfix][cdc-runtime] Keep upstream pending requests in order to avoid checkpoint hanging & state inconsistency in timestamp startup mode

This closes  apache#3576.

* [FLINK-36148][pipeline-connector/mysql] Fix that newly added table can not discovered by adding custom parser for CreateTableEvent

This closes apache#3570.

* [FLINK-36150][pipeline-connector/mysql] tables.exclude should work even scan.binlog.newly-added-table.enabled is true

This closes  apache#3573.

* [minor][cdc-runtime] Run schema coordinator logic asynchronously to avoid blocking the main thread

This closes apache#3557

* [hotfix][pipeline-connector/mysql] Fix primary key restraints missing when using inline `PRIMARY KEY` declaration syntax

This closes  apache#3579.

* [minor][tests] Fix test testDanglingDroppingTableDuringBinlogMode due to imprecise timestamp startup

This closes  apache#3580

* [FLINK-36184][transform] Fix transform operator swallows schema changes from tables not present in transform rules (apache#3589)

* [FLINK-36183] Fix lenient schema evolution failure with route blocks (apache#3583)

* [FLINK-36226][cdc][docs] Build 3.2 docs and mark it as stable (apache#3597)

* [FLINK-35503] Add support for running Oracle connector unit test on ARM architecture (apache#3600)

* [hotfix][CI] Fix Stale label automatically removed  without update activity

This closes apache#3621.

* [FLINK-34738][cdc][docs-zh] "Deployment - YARN" Page for Flink CDC Chinese Documentation (apache#3205)

* [FLINK-34738][cdc][docs-zh] "Deployment - YARN" Page for Flink CDC Chinese Documentation

* [FLINK-34738][cdc][docs-zh] Optimization for "Deployment - YARN" Page's Chinese Documentation

* [hotfix][tests] Fix oracle e2e test with ARM docker image without a database initial (apache#3634)

* [FLINK-36151][docs] Add schema evolution related docs

This closes apache#3575.

* [hotfix][test] Reorganize test cases

* [hotfix][ci] Optimize CI performance by implementing parallel execution

* [hotfix][ci] Fix ci failure in new GitHub runner images (apache#3645)

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-36214][build] Downgrade scala-maven-plugin version to 4.8.0 to keep compatibility with Java 8 (apache#3594)

* [FLINK-36326][cdc-connector][mysql] Send BinlogSplitUpdateRequestEvent only once to fix auto scan newly-added table failure (apache#3613)

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>
Co-authored-by: Hang Ruan <ruanhang1993@hotmail.com>

* [FLINK-36463][cdc-connector][mysql] Forbid to override some debezium options (apache#3631)

* [FLINK-36211][pipeline-connector/kafka] shade org.apache.flink.streaming.connectors.kafka to avoid conflict with flink-connector-kafka jar. (apache#3595)

* [FLINK-35291][runtime] Improve the ROW data deserialization performance of DebeziumEventDeserializationScheme (apache#3289)

Co-authored-by: liuzeshan <liuzeshan@bytedance.com>

* [FLINK-36174][cdc-cli] CDC yaml without pipeline should not throw exception. (apache#3588)

* [FLINK-35544][docs][deploy] Add deployment documentations for Kubernetes Operator

This closes apache#3392.

* [FLINK-36618][cdc-connector][postgres] Improve PostgresDialect.discoverDataCollections to reduce the bootstrap time

This closes apache#3672.

* [FLINK-36052][docs][cdc-pipeline-connector/es] Add flink cdc pipeline elasticsearch connector docs

This closes apache#3649.

Co-authored-by: wangjunbo <wangjunbo@qiyi.com>

* [FLINK-36474][route] Support merging timestamp columns when routing

This closes  apache#3636.

* [FLINK-36514][cdc-cli] Fix unable to override exclude schema types in lenient mode

This closes apache#3637.

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-36221][docs] Add CAST function documentations (apache#3596)

* [FLINK-35982][transform] Fix transform metadata failure when projection block is missed 

This closes apache#3592.

* [FLINK-35985][transform] Correct the substring function in transform rule

This closes apache#3537.

* [FLINK-36105][flink-cdc-cli] Fix unable to restore from checkpoint with Flink 1.20 (apache#3564)

* [FLINK-36247][cdc-connector][mysql] Fix potential transaction leak during MySQL snapshot phase (apache#3602)

* [hotfix][transform][minor] Fix potential code conflict that SUBSTR built-in function is 1-based index now

This closes apache#3696

* [FLINK-36649][cdc-connector][oracle] Fix oracle connection close error (apache#3678)

* [FLINK-36572][pipeline-connector][starrocks] Fix the issue that the local time zone is wrongly set

This closes  apache#3655.

* [FLINK-36407][runtime] Shut down coordinatorExecutor upon closing SchemaRegistry (apache#3624)

* [FLINK-36560][pipeline-connector][paimon] Fix the issue that timestamp_ltz field is not correctly converted

This closes  apache#3648.

* [FLINK-36408][pipeline-connector][mysql] Fix MySQL pipeline connector failed to parse FLOAT type with precision (apache#3625)

* [FLINK-36375][cdc-runtime] Fix missed default value in AddColumnEvent/RenameColunEvent

This closes  apache#3622.

* [FLINK-36517][pipeline-connector][paimon] Use filterAndCommit API to avoid committing the same datafile twice

 This closes  apache#3639

* [FLINK-36678][docs][deploy] Fix the typo in Flink CDC YARN deployment documentation

 This closes apache#3706.

* [FLINK-36541][pipeline-connector][paimon] Pass checkpointId to StoreSinkWrite#prepareCommit correctly

This closes apache#3652.

* [FLINK-36681][mysql-cdc][docs] Fix the wrong chunks splitting query in incremental snapshot reading section

This closes  apache#3703.

* [FLINK-36461][transform] Fix schema evolution failure with un-transformed tables

This closes  apache#3632.

* [FLINK-36609][pipeline-connector][paimon] Add partition columns to primary keys

This closes  apache#3641.

* [FLINK-36699][cdc-common] fix nullability when converting cdc data type to flink data type (apache#3713)

* [FLINK-35592][cdc-connector][mysql] Fix MysqlDebeziumTimeConverter miss timezone when convert to timestamp

This closes apache#3332

Co-authored-by: Hang Ruan <ruanhang1993@hotmail.com>

* [FLINK-36093][transform] Fix preTransformoperator wrongly filters columns belong to different transforms

This closes apache#3572

* [FLINK-36596][transform] Fix unable to schema evolve with project-less transform rules

This closes  apache#3665.

* [FLINK-36565][transform] Route allows merging Decimals with various precisions 

This closes apache#3651

* [FLINK-36285][doris] Fix unable to alter column type without default value specified 

This closes apache#3691.

* [FLINK-36586][build] Update flink version to 1.19 (apache#3660)


Co-authored-by: ConradJam <czy006@apache.com>
Co-authored-by: Hang Ruan <ruanhang1993@hotmail.com>

* [FLINK-36656][mysql] Fix type conversion failure for newly-added sharding table with mysql boolean type (apache#3683)

* [FLINK-36750][pipeline-connector/paimon] Restore SinkWriter from state to keep the information of previous SinkWrite when schema evolution happen

This closes apache#3744.

* [hotfix][source-connector/mysql] Fix conflicts after flink version bumped to 1.19 (apache#3748)

* [FLINK-36772][mysql][cdc-base] Fix error placeholder for errorMessageTemplate of Preconditions

This closes  apache#3754

* [hotfix][docs][pipeline-connector/es] Add missed supported flink versions and Elasticsearch versions

This closes  apache#3752

* [FLINK-36525][transform] Support for AI Model Integration for Data Processing (apache#3753)

* [hotfix][Doc] fix spelling error in Al model docs. (apache#3764)

* [FLINK-36803][cdc-connector][base & mysql] Use the same format `tableId:chunkId` for splitId in SnapshotSplit (apache#3763)

* [hotfix][tests][oceanbase] Fix oceanbase test failure, possibly caused by some interactions among cases (apache#3712)

* [FLINK-36315][cdc-connector][base&pg&mongodb]The flink-cdc-base module supports source metric statistics (apache#3619)


Co-authored-by: molin.lxd <molin.lxd@alibaba-inc.com>
Co-authored-by: Hang Ruan <ruanhang1993@hotmail.com>

* [docs] Update download links to up-to-date cdc version (apache#3766)

* [docs] Update download links to up-to-date cdc version

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* fix: replace params to allow dead link checking to pass

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* fix: just don't check any interpolating urls

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

---------

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-36878][pipeline-connector][kafka] Shade org.apache.kafka with org.apache.flink.kafka.shaded.org.apache.kafka instead of cdc path

This closes apache#3790.

* [hotfix][build] Add missed third-party OSS NOTICE

This closes  apache#3756.

* [hotfix][mysql] Remove deprecated MySQL incremental sources used in tests 

This closes apache#3792

* [FLINK-36771][cdc-connector][base&mysql] Fix UT trigger error: Invalid assigner status {} [NEWLY_ADDED_ASSIGNING_FINISHED] (apache#3755)

* [FLINK-36891[[source-connector][mysql] Fix corrupted state in case of serialization failure in  MySQL CDC Source

This closes apache#3794.

* [FLINK-34688][cdc-connector][base] CDC framework split snapshot chunks asynchronously (apache#3510)

* [FLINK-36895][cdc-connector][base] The JdbcSourceChunkSplitter#queryMin method passed a parameter with tableName/coiumnName reversed. (apache#3797)

* [FLINK-36558][source-connector/mysql] Fix column metadata parsing compatibility with MySQL 8.0.17&8.0.18

This closes  apache#3647.

* [FLINK-36854][transform] Add missed comment and default value after transform

This closes apache#3780.

* [FLINK-36866][transform] Fix unable to narrow casting on numeric values 

This closes apache#3786.

* [hotfix] Fix Java 11 target compatibility & add tests (apache#3633)

* [hotfix] Fix Java 11 target compatibility

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

# Conflicts:
#	.github/workflows/flink_cdc_java_8.yml
#	.github/workflows/flink_cdc_migration_test_base.yml
#	pom.xml

* fix: clarify GiHub workflow names

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

---------

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-36864][cdc-runtime] Fix unable to use numeric literals that goes beyond Int32 range (apache#3785)

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-36879][runtime] Support to convert delete as insert in transform (apache#3804)

* [hotfix] Fix building status badge in README as workflow files have been refactored (apache#3807)

* [hotfix] Fix building status badge in README as workflow files have been refactored

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* fix: rename workflow files to `flink_cdc_ci` / `flink_cdc_ci_nightly`

As they're not strictly tied to specific JDK versions.

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

---------

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-36784][common] Support to add metadata columns for data in the meta fields of DataChangeEvent at transform (apache#3758)


Co-authored-by: Kunni <lvyanquan.lyq@alibaba-inc.com>

* [hotfix][docs] Fix ToC to include H1

This closes apache#3773.

* [tests][build] Update migration test matrix to 3.2.0 and later

* [cdc-common] Extract column / schema type merging utility methods to `SchemaMergingUtils`

* [FLINK-36763][cdc-runtime] Introduce distributed schema evolution topology for sources with parallelized metadata

This closes apache#3801

* [FLINK-36690][cdc-runtime] Fix schema operator hanging under extreme parallelized pressure
This closes apache#3680

* [hotfix][tests] Fix unstable `testInitialStartupModeWithOpTs` case (apache#3809)

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-36952][source-connector/mongodb] Fix typos in MongoDB connector Chinese documentation (apache#3811)

* [build][hotfix] Allow pinning issues & PRs that never decay

This closes apache#3816

* [hotfix][tests] Fix unstable `OceanBaseMySQLModelITCase` (apache#3831)

Signed-off-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>

* [FLINK-36790][cdc-connector][paimon] Set waitCompaction to true in PaimonWriter to avoid CME problem

This closes  apache#3760

Co-authored-by: wuzhiping <wuzhiping.007@bytedance.com>

* [FLINK-34545][pipeline-connector/ob]Add OceanBase pipeline connector to Flink CDC 

This closes apache#3360.

* [FLINK-37017][cdc-common] Supports map and array types for binary record data 

This closes apache#3434.

* [hotfix][docs] Add missed enumeration in MySQL Source

This closes apache#3832

* [FLINK-35167][pipeline-connector/maxcompute] Introduce MaxCompute pipeline DataSink

This closes  apache#3254.

* [hotfix][k8s] Fix command-line option `from-savepoint` does not work in K8s deployment mode

This closes  apache#3825.

Co-authored-by: helloliuxg@gmail.com <xiaogenliu@kugou.net>

* [hotfix][cdc-common] Remove duplicated code to improve performance

 This closes apache#3840.

Co-authored-by: zhangchaoming.zcm <zhangchaoming.zcm@antgroup.com>

* [FLINK-36524][pipeline-connector][paimon] Bump Paimon version to 0.9.0 

This closes apache#3644

* [FLINK-36877][pipeline-connector/kafka] Correct canal-json output for delete record

This closes apache#3788

* [FLINK-36956][transform] Append NOT NULL attribute for Primary Key columns

This closes  apache#3815

* [FLINK-36981][transform] Considering sharding tables with different schema in transform projection

This closes apache#3826.

* [FLINK-36970][cdc-common] Merge result of data type BIGINT and DOUBLE should be DOUBLE instead of STRING

This closes apache#3821

* [FLINK-36701][cdc-runtime] Obtain latest evolvedSchema when SinkDataWriterOperator handles FlushEvent from failover

This closes apache#3802

Co-authored-by: jzjsnow <snow.jiangzj@gmail.com>

* [hotfix][ci] Fix CI failure due to implicit conflicts 

This closes apache#3846

* [FLINK-35325][transform] Skip insufficient_quota error when running test case using ad model. (apache#3849)

* [FLINK-36811][mysql] MySQL cdc setIsProcessingBacklog in snapshot phase and exit when snapshot phase finished (apache#3793)

* [FLINK-37012][transform] Fix argument type mismatch when metadata column used in function

This closes apache#3837

* [FLINK-36610] MySQL CDC supports parsing gh-ost / pt-osc generated schema changes (apache#3668)

Co-authored-by: MOBIN-F <18814118038@163.com>

* [FLINK-36865][cdc] Provide UNIX_TIMESTAMP series functions in YAML pipeline

This closes apache#3819.

* [FLINK-36964][pipeline-connector/paimon] Fix potential exception when SchemaChange in parallel with Paimon Sink

This closes apache#3818

Co-authored-by: yuxiqian.yxq <yuxiqian.yxq@alibaba-inc.com>

* [FLINK-36858][pipeline-connector][kafka] Fix JsonRowDataSerializationSchema compatibility issue with Flink 1.20

This closes  apache#3784

Co-authored-by: Leonard Xu <xbjtdcq@gmail.com>

* [FLINK-36985][pipeline-connector/paimon] Tolerante ColumnAlreadyExistException when apply AddColumnEvent in paimon

This closes  apache#3828.

* [FLINK-37042][pipeline-connector/maxcompute] Rename maxcompute pipieline connector options to follow flink style

This closes  apache#3852

* [FLINK-36193][pipeline-connectors][paimon/doris/starrocks] Supports applying TRUNCATE & DROP Table Event to Paimon, StarRocks and Doris

This closes apache#3673

* [FLINK-36700][pipeline-connector][elasticsearch] Elasticsearch pipeline sink supports authentication

This closes  apache#3728

* [FLINK-37011][cdc-transform] Improve get source field value by column name in PreTransformProcessor

This closes apache#3836

* [FLINK-35634][build] Add CdcUp playground CLI scripts 

This closes apache#3605

* [FLINK-36974][cdc-cli] Support overwrite flink configuration via command line

This closes apache#3823

Co-authored-by: helloliuxg@gmail.com <xiaogenliu@kugou.net>

* [FLINK-35387][cdc-connector][postgres] PG CDC source support heart beat

This closes apache#3667

* [FLINK-36636][transform] Supports timestamp comparison in cdc pipeline transform 

This closes apache#3677

* [FLINK-36977][pipeline-connector/paimon] Apply default value when process add_column schema change envent

This closes apache#3824.

Co-authored-by: Leonard Xu <xbjtdcq@gmail.com>

* [FLINK-36741][transform] Fix the decimal precision and length lost during transform

This closes apache#3740

* [FLINK-36647][transform] Support Timestampdiff and Timestampadd function in cdc pipeline transform

This closes apache#3698

* [build][minor] Upgrade the max file length from 3k to 4k

* [FLINK-34865][pipeline-connector/mysql] Support sync table and column comment

This closes apache#3482

Co-authored-by: Leonard Xu <xbjtdcq@gmail.com>

* [FLINK-37124][build] Simplify logs in test cases to avoid flooding GHA outputs

This closes  apache#3860

* [FLINK-36754][transform] Projection should treated as an asterisk when projection expression is empty or null

This closes  apache#3749

* [hotfix][pipeline-connector][maxcompute] Fix MaxCompute Pipeline Connector tests by upgrading maxcompute-emulator

This closes  apache#3862

* [FLINK-36282][pipeline-connector][cdc-connector][mysql]fix incorrect data type of TINYINT(1) in mysql pipeline connector (apache#3608)

* [FLINK-36282][pipeline-connector][cdc-connector][mysql]fix incorrect data type of TINYINT(1) in mysql pipeline connector

* reformat code

* Update MySqlPipelineITCase.java

* pass a boolean value instead of Properties

* uodate FAQ

* add a method to get tinyInt1isBit

* add new cdc config `treat-tinyint1-as-boolean`

* Update MySqlChunkSplitter.java

* change param name

* [FLINK-36406][cdc-runtime] Close MetadataApplier when the job stops

This closes  apache#3623

* [FLINK-36224][docs] Add the version mapping between pipeline connectors and flink

This closes  apache#3598

* [FLINK-36620][cdc-cli] Add support for specifying the --flink-home parameter via an '=' character

This closes apache#3838

* [FLINK-36351][pipeline-connector/doris] Support the conversion of Flink TIME type to Doris String type

This closes apache#3620

* [FLINK-35802][pipeline-connectors/mysql] Clean ChangeEventQueue to avoid deadlock when calling BinaryLogClient#disconnect method

This closes apache#3463

* [FLINK-35600][pipeline-connector/mysql] Add timestamp for low and high watermark 

This closes apache#3415

* [hotfix][test] Make pipeline migration tests more robust 

This closes apache#3866

* [hotfix[cdc-runtime] Close MetadataApplier in SchemaRegistry when the job stops

 This closes apache#3864

* [FLINK-36913][pipeline-connector][kafka] Introduce option to define custom mapping from upstream table id to downstream topic name

This closes apache#3805

* [FLINK-35067][cdc-connector][postgres] Adding metadata 'row_kind' for Postgres CDC Connector.

 This closes apache#3716.

Co-authored-by: Leonard Xu <xbjtdcq@gmail.com>

* [FLINK-37122][build] Try to be compatible with old flink version 1.18.x

This closes apache#3859

* [FLINK-36578][pipeline-connector/mysql] Introduce option to unify json type output between snapshot phase and binlog phase

This closes apache#3658

* [minor][ci] Set proper timeout for compile_and_test step of CI job

* [hotfix][pipeline-connector][mysql] Fix missed optional option in MySqlDataSourceFactory 

This closes apache#3867

* [FLINK-34865][pipeline-connector/mysql] Support sync newly added table's comment

This closes apache#3869

* [hotfix][build] Miscellaneous fixes on GHA workflows

This closes apache#3839

* [build] Update version to 3.4-SNAPSHOT and add release-3.3 docs

This closes apache#3870

* [build] Update version to 3.4-SNAPSHOT and add release-3.3 docs 

This closes apache#3870

* [hotfix][docs][pipeline-connector/maxcompute] Fix maxcompute connector typo in examples section

This closes apache#3875

* [hotfix][cdc][docs] Build 3.3 docs and mark it as stable (apache#3882)

* [hotfix][docs] Change docs in master to 3.4-SNAPSHOT (apache#3886)

* [FLINK-37233][docs] Update supported flink versions and sinks  (apache#3899)

* [FLINK-34729][docs] Translate "Core Concept" Pages of Flink CDC into Chinese

This closes apache#3901

* [FLINK-37252][doc] Align Postgres CDC Connector Chinese docs with English version

This closes apache#3903

* [FLINK-37224][docs] Add the missing documents and parameters of MongoDB CDC

This closes apache#3895

* [FLINK-37251][doc] Add pipeline connectors' download link in overview.md

This closes apache#3900

* [FLINK-37231][docs] Add documentation for CDC Source metrics

This closes apache#3897

* [FLINK-36698][pipeline-connector][elasticsearch] Elasticsearch Pipeline Sink supports index sharding

This closes apache#3723.

Co-authored-by: wangjunbo <wangjunbo@qiyi.com>

* [FLINK-37287][docs] Add missed Apache Paimon 0.9 and Fixing typo on overview.md

This closes apache#3913

* [hotfix][license] Update legacy license

This closes  apache#3908

* [tests][ci] Miscellaneous improvements on CI robustness 

This closes apache#3911

* [FLINK-37262][pipeline-connector/mysql] Fix missing PARSE_ONLINE_SCHEMA_CHANGES option in MySqlDataSourceFactory

This closes apache#3910

* [FLINK-37191][cdc-connector/mysql] Avoid back filling when lowWatermark is equal to highWatermark in BinlogSplit

This closes  apache#3902

* [FLINK-36564][ci] Running CI in random timezone to expose more time related bugs

This closes  apache#3650

* [FLINK-36945][cdc-connector/mysql] Support parsing rename multiple tables in one statement

This closes apache#3876.

* Merge remote-tracking branch 'refs/remotes/github/master' into rh11cp

* refs/remotes/github/master: (308 commits)
  [FLINK-36945][cdc-connector/mysql] Support parsing rename multiple tables in one statement
  [FLINK-36564][ci] Running CI in random timezone to expose more time related bugs
  [FLINK-37191][cdc-connector/mysql] Avoid back filling when lowWatermark is equal to highWatermark in BinlogSplit
  [FLINK-37262][pipeline-connector/mysql] Fix missing PARSE_ONLINE_SCHEMA_CHANGES option in MySqlDataSourceFactory
  [tests][ci] Miscellaneous improvements on CI robustness
  [hotfix][license] Update legacy license
  [FLINK-37287][docs] Add missed Apache Paimon 0.9 and Fixing typo on overview.md
  [FLINK-36698][pipeline-connector][elasticsearch] Elasticsearch Pipeline Sink supports index sharding
  [FLINK-37231][docs] Add documentation for CDC Source metrics
  [FLINK-37251][doc] Add pipeline connectors' download link in overview.md
  [FLINK-37224][docs] Add the missing documents and parameters of MongoDB CDC
  [FLINK-37252][doc] Align Postgres CDC Connector Chinese docs with English version
  [FLINK-34729][docs] Translate "Core Concept" Pages of Flink CDC into Chinese
  [FLINK-37233][docs] Update supported flink versions and sinks  (apache#3899)
  [hotfix][docs] Change docs in master to 3.4-SNAPSHOT (apache#3886)
  [hotfix][cdc][docs] Build 3.3 docs and mark it as stable (apache#3882)
  [hotfix][docs][pipeline-connector/maxcompute] Fix maxcompute connector typo in examples section
  [build] Update version to 3.4-SNAPSHOT and add release-3.3 docs
  [build] Update version to 3.4-SNAPSHOT and add release-3.3 docs
  [hotfix][build] Miscellaneous fixes on GHA workflows
  ...

* fix test

* fix tests

* fix test

* fix test

* fix test

* fix: downloading gh-ost cli timeout

修复了 gh-ost cli 下载超时的问题,放在资源目录里了。
Link: https://code.alibaba-inc.com/ververica/flink-cdc/codereview/20506267
* fix: downloading gh-ost cli timeout

* fix: fill in dummy methods in YamlServiceImpl

* Shutdown previous jobs properly (to avoid serverId conflicts)

* update 1.20

* feat: Resolve conflict, auto committed by CodeFlow

* fix test

* fix test

* fix test

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved docs Improvements or additions to documentation elasticsearch-pipeline-connector reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants