The syncing task is running, but in fact binlog event isn't syncing to downstream #11391

giant-panda666 · 2024-07-05T20:52:20Z

What did you do?

runing syncing task, the worker log as below:
after some time, the task is still running, but syncerBinlog pos don't be updated:
if i run resume command, it returns error:

and syncerBinlog in subTaskStatus still don't be updated
if i run pause command and resume command, worker returns error and then restart syncer. syncerBinlog pos is updated too.

dm version is v7.5.1 deployed by tidb operator.

What did you expect to see?

No response

What did you see instead?

No response

Versions of the cluster

DM version (run dmctl -V or dm-worker -V or dm-master -V):

(paste DM version here, and you must ensure versions of dmctl, DM-worker and DM-master are same)

Upstream MySQL/MariaDB server version:

(paste upstream MySQL/MariaDB server version here)

Downstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

How did you deploy DM: tiup or manually?

(leave TiUP or manually here)

Other interesting information (system version, hardware config, etc):

>
>

current status of DM cluster (execute `query-status <task-name>` in dmctl)

(paste current status of DM cluster here)

The text was updated successfully, but these errors were encountered:

lance6716 · 2024-07-10T02:47:04Z

please provide full log so we know what's happened. Maybe you can also check the binlog event at that position to see its content, and capture the goroutine stack of DM-worker.

giant-panda666 · 2024-07-10T07:29:12Z

All logs as above. This is the goroutine stack:
goroutine.txt

lance6716 · 2024-07-10T08:05:37Z

goroutine.txt

At the moment of dumping goroutine, dm-worker is waiting to read binlog from relay component, and relay is waiting to read binlog from upstream MySQL. Do you see the lag at that moment?

giant-panda666 · 2024-07-10T08:24:50Z

There was no lag at that moment and query-status shows that masterBinlog and relayBinlog was almost the same, but syncerBinlog was stopped at last binlog file. The alert rule is dm_syncer_binlog_file{node="master"} - on(pod, task) dm_syncer_binlog_file{node="syncer"} > 0 evaluting every 1m for 10m，so when i found this problem it has passed at least 10m.

lance6716 · 2024-07-10T08:30:55Z

There was no lag at that moment and query-status shows that masterBinlog and relayBinlog was almost the same, but syncerBinlog was stopped at last binlog file. The alert rule is dm_syncer_binlog_file{node="master"} - on(pod, task) dm_syncer_binlog_file{node="syncer"} > 0 evaluting every 1m for 10m，so when i found this problem it has passed at least 10m.

Just double check, the goroutine was dumped at the time that there's lag between relayBinlog and syncerBinlog? If so, there's a relay.meta file under the relay log folder of dm-worker's relay folder. Can you show its content and the directory structure of relay folder?

giant-panda666 · 2024-07-10T08:38:07Z

Yes，when i received the alert, i made the goroutine dump and then recovered it by pause-task and resume-task. Maybe i need to wait it reproduced and then i take relay.meta and directory structure down, because it has be recovered.

giant-panda666 · 2024-07-10T08:59:07Z

goroutine.txt

One qeustion: why there are 26 connections(grep goroutine.txt by "created by github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher in goroutine 294") created by handleSourceBound？syncer worker-count is 16 and validators worker-count is 4, so what about there are another 6 connections?

giant-panda666 · 2024-07-25T16:13:00Z

@lance6716 It's reproduced.
dm-worker log:

relay.meta:

query-status result:

data.012136 was just created after i caputured the dm-worker log, so the relay log works well. Maybe it's related with reboot of dm-worker' host which dm-master thinks that syner task is running but in fact not, it doesn't be reproduced until reboot of host.

giant-panda666 added area/dm Issues or PRs related to DM. type/bug The issue is confirmed as a bug. labels Jul 5, 2024

jebter added the severity/moderate label Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The syncing task is running, but in fact binlog event isn't syncing to downstream #11391

The syncing task is running, but in fact binlog event isn't syncing to downstream #11391

giant-panda666 commented Jul 5, 2024 •

edited

Loading

lance6716 commented Jul 10, 2024

giant-panda666 commented Jul 10, 2024

lance6716 commented Jul 10, 2024

giant-panda666 commented Jul 10, 2024 •

edited

Loading

lance6716 commented Jul 10, 2024

giant-panda666 commented Jul 10, 2024

giant-panda666 commented Jul 10, 2024 •

edited

Loading

giant-panda666 commented Jul 25, 2024 •

edited

Loading

The syncing task is running, but in fact binlog event isn't syncing to downstream #11391

The syncing task is running, but in fact binlog event isn't syncing to downstream #11391

Comments

giant-panda666 commented Jul 5, 2024 • edited Loading

What did you do?

What did you expect to see?

What did you see instead?

Versions of the cluster

current status of DM cluster (execute query-status <task-name> in dmctl)

lance6716 commented Jul 10, 2024

giant-panda666 commented Jul 10, 2024

lance6716 commented Jul 10, 2024

giant-panda666 commented Jul 10, 2024 • edited Loading

lance6716 commented Jul 10, 2024

giant-panda666 commented Jul 10, 2024

giant-panda666 commented Jul 10, 2024 • edited Loading

giant-panda666 commented Jul 25, 2024 • edited Loading

giant-panda666 commented Jul 5, 2024 •

edited

Loading

current status of DM cluster (execute `query-status <task-name>` in dmctl)

giant-panda666 commented Jul 10, 2024 •

edited

Loading

giant-panda666 commented Jul 10, 2024 •

edited

Loading

giant-panda666 commented Jul 25, 2024 •

edited

Loading