Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drainer surport Synchronization exception skip #963

Open
freemindLi opened this issue May 9, 2020 · 7 comments
Open

drainer surport Synchronization exception skip #963

freemindLi opened this issue May 9, 2020 · 7 comments
Labels
feature-request This issue is a feature request

Comments

@freemindLi
Copy link
Contributor

freemindLi commented May 9, 2020

Feature Request

drainer surport Synchronization exception skip
Is your feature request related to a problem? Please describe:
drainer surport Synchronization exception skip

Describe the feature you'd like:
When the synchronization is abnormal, the drainer exits directly. When the Draine is restarted, there are about 15 minutes of safe mode mode. At this time, there are two problems: 1. If safe mode can't solve this problem, you need to manually set ignore Tso, if the number of errors of the same type is very large, this is an immeasurable work, and at the same time it seriously affects the synchronization progress. Then on the production line, if one kind of errors are encountered, and the amount of data involved is very large, it is to skip this kind of data synchronization first, to ensure that other tables or this table do not conflict with the data synchronization, rather than the entire synchronization is blocked; 2 Even if the mode can be solved, after the safe mode period of time, if this problem occurs again, it needs to be restarted. If the safe mode is on all the time, it will consume performance and will not make sense, resulting in increased synchronization delay

Describe alternatives you've considered:
Maybe it's impossible for us to clearly distinguish what error causes synchronization interruption. Of course, drainer can record this error and feed it back to the monitor, but in drainer, we can distinguish between primary key conflict or non null fields. There are too many types of errors, and we hope to recover synchronization as soon as possible. Therefore, we can divide the error types into two types: 1. DDL error, 2. DML error, set two configuration items, whether to skip DDL / DML error or not. If yes is selected, the synchronization of this data will be skipped if DML / DDL error is encountered, and the synchronization of the next data will continue. If no is selected, the logic is the same as before. When an error is encountered, the drainer should record the corresponding error information, so as to find the cause and repair the data later
Teachability, Documentation, Adoption, Migration Strategy:

@freemindLi freemindLi added the feature-request This issue is a feature request label May 9, 2020
@WangXiangUSTC
Copy link
Contributor

do we need to print log or output it to a file when ignore the error?

@freemindLi
Copy link
Contributor Author

do we need to print log or output it to a file when ignore the error?

yes,When an error is encountered, the drainer should record the corresponding error information, so as to find the cause and repair the data later

@WangXiangUSTC
Copy link
Contributor

I think maybe it is better to set the ignore error's type in the config file, for example:

ignore-sql-error = ["Duplicate Entry", "Data too long for column"]

execute sql will get an error when the database is closed, or the network is timeout, I think these errors should not be ignored

@freemindLi
Copy link
Contributor Author

freemindLi commented May 12, 2020

execute sql will get an error when the database is closed, or the network is timeout, I think these errors should not be ignored

I understand what you mean, but database shutdown and connection timeout do not belong to DDL / DML error. When there is an error, need to judge whether it belongs to the ignored part of the error. Of course, your plan is feasible

@WangXiangUSTC
Copy link
Contributor

we need to distinguish the error's type after executing sqls, which error is caused by the network, and which error is return by the database, do you have some idea about how to do it?

@IANTHEREAL
Copy link
Collaborator

@amyangfei please copy this issue to TiCDC

@amyangfei
Copy link
Contributor

@amyangfei please copy this issue to TiCDC

@IANTHEREAL pingcap/tiflow#832 added a similar issue in TiCDC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue is a feature request
Projects
None yet
Development

No branches or pull requests

4 participants