Open
Description
Is your feature request related to a problem?
I am trying to use ticdc to migration TiDB data to snowflake. The main process is
- create file format and stage in snowflake to store files
CREATE OR REPLACE FILE FORMAT my_log_csv_format
TYPE = CSV
FIELD_OPTIONALLY_ENCLOSED_BY='"';
CREATE STAGE my_s3_stage_log
STORAGE_INTEGRATION = my_s3
URL = 's3://wenxuan-snowflake-test/cdc/test2/chbenchmark/'
FILE_FORMAT = my_log_csv_format;
-
create a changefeed in cdc to capture data change, and store in S3
-
put the file in s3 into snowflake stage
CREATE OR REPLACE STAGE "table_a" FILE_FORMAT = my_log_csv_format;
-
merge the staged file into snowflake table
-
remove the file from the stage
There will cause an error in step4 when there are multiple dml events in the same file. The MERGE INTO
statements of Snowflake can not update target table real-time. So there are two dml on the same row like insert row1 -> delete row1
, then row1 will not be deleted.
Describe the feature you'd like
Merge the dml events affect on the same row in the same file.
Like in CDC0000001.csv
, we have
Case 1
uk
U 0 1 A
U 0 2 A
merge to
uk
U 0 2 A
Case 2
uk
I 0 1 A
U 0 2 A
merge to
uk
I 0 2 A
Case 3
uk
I 0 1 A
D 0 1 A
merge to
uk
Describe alternatives you've considered
It can also help improve the performance of consuming in the downstream
Teachability, Documentation, Adoption, Migration Strategy
No response
Activity