-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make EXCHANGE PARTITION idempotent #11458
Comments
How about we record which table is partitioned before the DDL, then execute EXCHANGE PARTITION. So when failover we can query the current table structure and compare with the record to know if exchange is happened. |
@lance6716 create table a (a bigint primary key, b bigint) partition by hash (a) partitions 5;
create table b (a bigint primary key, b bigint);
insert into a values (1, 50), (6, 60), (11, 70), (16, 80), (21, 90), (26, 100), (2, 20), (3, 30), (4, 40);
insert into b values (6, -10), (16, -20), (26, -30), (36, -40), (46, -50), (56, -60);
alter table a exchange partition p1 with table b; You can record the downstream Table IDs of the involved objects, e.g. on TiDB: > select tidb_table_id from information_schema.tables where (table_schema, table_name) = ('test', 'b');
106
> select tidb_partition_id from information_schema.partitions where (table_schema, table_name, partition_name) = ('test', 'a', 'p1');
111
> alter table a exchange partition p1 with table b;
> select tidb_table_id from information_schema.tables where (table_schema, table_name) = ('test', 'b');
111
> select tidb_partition_id from information_schema.partitions where (table_schema, table_name, partition_name) = ('test', 'a', 'p1');
106 On MySQL (if using > select name, table_id from information_schema.innodb_tables where name in ('test/b', 'test/a#p#p1');
+-------------+----------+
| name | table_id |
+-------------+----------+
| test/b | 1076 |
| test/a#p#p1 | 1077 |
+-------------+----------+
> alter table a exchange partition p1 with table b;
> select name, table_id from information_schema.innodb_tables where name in ('test/b', 'test/a#p#p1');
+-------------+----------+
| name | table_id |
+-------------+----------+
| test/b | 1077 |
| test/a#p#p1 | 1076 |
+-------------+----------+ On MariaDB the info table to query is I don't consider these viable in general because of the extra privilege requirements. |
Is your feature request related to a problem?
TiCDC relies on idempotence of DMLs and DDLs to ensure eventual consistency. Events are executed at least once, especially when retry / pause-resume is involved. In particular, TiCDC assumes that DDLs can be repeatedly executed, and the outcome should be one of:
TRUNCATE
orMODIFY COLUMN
)CREATE TABLE
orDROP TABLE
)However,
EXCHANGE PARTITION
violates this assumption. Retrying EXCHANGE PARTITION is not a no-op, and this will lead to data inconsistency.Describe the feature you'd like
Find some way to force EXCHANGE PARTITION to be idempotent.
Because of barrierTs, I suppose that only one single DDL will be retried (i.e. seeing …→①→①→②→②→… downstream is possible, but …→①→②→①→②→… not, assuming DDLs ① and ② share an common set of affected tables).
I also assume for each downstream database, each table is replicated only by a single changefeed.
We create a table downstream to record the EXCHANGE PARTITION state
Prepare the current execution state
(1, 'prepared')
, it means we have never executed the DDL before, so proceed.(0, 'executed')
, it means the DDL has been successfully executed, so skip.(0, 'prepared')
, we don't know whether the DDL has been executed or not, so error and stop (move to "warning" state).Execute the DDL.
Mark the DDL as executed.
row_count()
is 0, it means we have got external interference and better stop and report an error.Periodically GC the
`tidb_cdc`.`ddl_status_v1`
table, removing entries preceding the upstream GC safepoint.Note that currently the
tidb_cdc
schema is only reserved when syncpoint is enabled. Implementing this feature means the schema will be unconditionally reserved downstream.Describe alternatives you've considered
Declare that EXCHANGE PARTITION cannot be replicated.
Note that a similar non-idempotent statement
RENAME TABLE a to temp, b to a, temp to b;
has been explicitly documented as non-compliant in https://docs.pingcap.com/tidb/stable/ticdc-ddl#rename-multiple-tables-in-a-ddl-statement.Teachability, Documentation, Adoption, Migration Strategy
There should be clear instruction about how the user should remedy errors from each step.
User could check the total row count of the partition and exchanged table and determine if they should
The text was updated successfully, but these errors were encountered: