Open
Description
Background
TiCDC is an important component for TiDB to synchronize data to various downstream systems. When synchronizing data to downstream systems, data integrity is especially important. However, TiCDC does not support end-to-end data integrity verification yet.
Spec
Provide below cluster level boolean type option in TiDB side.
tidb_enable_row_level_checksum = [true|false] # the default value is false.
SET GLOBAL tidb_enable_row_level_checksum = true;
After the customer enables this option, every data change for a row in non-system databases will append an invisible field that is used to store a computed checksum value based on the content of the row. This invisible field is just for data correctness checking purposes and is transparent to the customer.
TiCDC and end users would use this checksum value to verify the data integrity.
Development tracking for the TiDB part
- Add a checksum function
tidb_row_checksum
to return the checksum value of a row. *: add tidb_row_checksum() as a builtin function #43479 - Add checksum-related utilities in tidb, in this case, CRC32 would be used, the calculation method is shown as the following. util: extend row format with checksum #42859 util: reimplement row level checksum utilities #43141
-
Let tidb be aware of the origin state (none or public) of a column if its current state is not public.-- we always append two checksums if there is a column whose state is not public, thus no need to know the direction of state transform. - Support writing rows with checksum values *: support writing rows with checksum values #43163
- Add a global system variable
tidb_enable_row_level_checksum
to enable or disable the checksum calculation when inserting new rows. When it's enabled, multi-schema change will be blocked. - Make it work with the DDL
add column
schema change, and generate two checksum values if necessary. - Make it work with the DDL
drop column
schema change, and generate two checksum values if necessary. - Make it work with the DDL
modify column
schema change, and generate two checksum values if necessary. - Calculate the row checksum in the
tablecodec
package whenEncodeRow
function is used. Calculate the CRC32 result for each column when executingencodeRowCols
, a checksum result is returned finally. - Append the checksum header and checksum result information to the encoded row according to the extended row format protocol.
- Add a global system variable
- Keep the read request processing compatibility.
- Skip the checksum part processing for
chunckDecoder
in tidb if necessary. util: extend row format with checksum #42859 - Skip the checksum part processing in
internal_handle_request
andPointGetter
for chunk encoding processing in tikv if necessary. storage: add checksum logic in row slice, add cop and get test cases tikv/tikv#14611 -
Skip the checksum part processing in tiflash if necessary.-- tiflash decodes a row value byappendRowV2ToBlockImpl
. it iterates columns and decodes them one by one here, that is, the checksum part shall be already discarded in the current implementation. - Skip the checksum part processing in tikv client libs if necessary. discard the extended checksum part in row values tikv/client-java#739
- Skip the checksum part processing for
- Add telementry for the new feature.
- Compatibility tests, the checksum extended part should not impact the backward compatibility, and downgrade is supported when the checksum row format is used.
Activity