Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeed lag reached 556s when inject network partition between one of ticdc and pd leader last for 10mins #10849

Closed
Lily2025 opened this issue Mar 26, 2024 · 6 comments · Fixed by #11076
Assignees
Labels
affects-7.5 affects-8.1 area/ticdc Issues or PRs related to TiCDC. type/enhancement The issue or PR belongs to an enhancement.

Comments

@Lily2025
Copy link

Lily2025 commented Mar 26, 2024

What did you do?

1、create changefeed
2、run sysbench
3、inject network partition between one of ticdc and pd leader last for 10mins

What did you expect to see?

No response

What did you see instead?

changefeed lag reached 556s

image

Versions of the cluster

./cdc version
Release Version: v8.0.0
Git Commit Hash: ba58ef9
Git Branch: HEAD
UTC Build Time: 2024-03-19 08:07:03
Go Version: go version go

current status of DM cluster (execute query-status <task-name> in dmctl)

No response

@Lily2025 Lily2025 added area/dm Issues or PRs related to DM. type/bug The issue is confirmed as a bug. labels Mar 26, 2024
@Lily2025
Copy link
Author

/type enhancement

@ti-chi-bot ti-chi-bot bot added the type/enhancement The issue or PR belongs to an enhancement. label Mar 26, 2024
@Lily2025 Lily2025 changed the title changefeed lag reached 232s when inject network partition between one of ticdc and one of tikv changefeed lag reached 556s when inject network partition between one of ticdc and one of tikv Mar 26, 2024
@Lily2025
Copy link
Author

/remove-area dm
/area ticdc

@ti-chi-bot ti-chi-bot bot added area/ticdc Issues or PRs related to TiCDC. and removed area/dm Issues or PRs related to DM. labels Mar 26, 2024
@Lily2025 Lily2025 changed the title changefeed lag reached 556s when inject network partition between one of ticdc and one of tikv changefeed lag reached 556s when inject network partition between one of ticdc and one of tikv last for 10mins Mar 26, 2024
@Lily2025 Lily2025 changed the title changefeed lag reached 556s when inject network partition between one of ticdc and one of tikv last for 10mins changefeed lag reached 556s when inject network partition between one of ticdc and pd leader last for 10mins Mar 29, 2024
@flowbehappy
Copy link
Collaborator

flowbehappy commented Apr 8, 2024

This issue is by designed. @asddongmen will see whether it can be addressed by etcd-io/etcd#17465 (comment). If not, then I suggest we address it in long term.

@fubinzh
Copy link

fubinzh commented Apr 8, 2024

/severity moderate

@asddongmen
Copy link
Contributor

Based on the existing design, the delay of cdc cannot be guaranteed in this situation.

@asddongmen asddongmen reopened this Apr 15, 2024
@asddongmen
Copy link
Contributor

asddongmen commented Apr 28, 2024

TiCDC communicates with PD to obtain the latest region information from upstream TiKV in order to establish a data capture stream for the regions. If a cdc server experiences a network partition with the PD leader, it cannot receive this information, and as a result, it cannot establish the stream. This leads to an increase in changefeed lag.
It is nearly impossible to solve this problem based on the current architecture.

Wokraround
If network partition to PD leaser issue occurs in a CDC node, you can shut down the CDC node. The tables in this node will then be transferred to another CDC node for replication.

/cc @flowbehappy @fubinzh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 affects-8.1 area/ticdc Issues or PRs related to TiCDC. type/enhancement The issue or PR belongs to an enhancement.
Projects
4 participants