Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binlogsyncer readInitialHandshake blocks for 10 more hours #11041

Closed
D3Hunter opened this issue May 7, 2024 · 0 comments · Fixed by #11043
Closed

binlogsyncer readInitialHandshake blocks for 10 more hours #11041

D3Hunter opened this issue May 7, 2024 · 0 comments · Fixed by #11043

Comments

@D3Hunter
Copy link
Contributor

D3Hunter commented May 7, 2024

What did you do?

DM connects Aurora write instance directly through vpc-peering, no proxy or lb in between.

as ResetReplicationSyncer is holding the lock and blocks, it causes all query-status timeout

we need set timeout before read it, there is a fix go-mysql-org/go-mysql#861 on binlogsyncer side

goroutine 534652 [sync.RWMutex.RLock, 827 minutes]:
sync.runtime_SemacquireRWMutexR(0xc008271c08?, 0x6c?, 0xc003726220?)
	runtime/sema.go:82 +0x25
sync.(*RWMutex).RLock(...)
	sync/rwmutex.go:71
github.com/pingcap/tiflow/dm/syncer/binlogstream.(*StreamerController).GetBinlogType(0xc0042e21e0)
	github.com/pingcap/tiflow/dm/syncer/binlogstream/streamer_controller.go:600 +0x48
github.com/pingcap/tiflow/dm/syncer.(*Syncer).Status(0xc0048e8d80, 0xc009e4c1e0)
	github.com/pingcap/tiflow/dm/syncer/status.go:67 +0x73d
github.com/pingcap/tiflow/engine/executor/dm.(*unitHolderImpl).updateSourceStatus(0xc003c37500, {0x55947f0?, 0xc000eac4d0?})
	github.com/pingcap/tiflow/engine/executor/dm/unitholder.go:261 +0x1cd
github.com/pingcap/tiflow/engine/executor/dm.(*unitHolderImpl).CheckAndUpdateStatus.func1()
	github.com/pingcap/tiflow/engine/executor/dm/unitholder.go:286 +0x8b
created by github.com/pingcap/tiflow/engine/executor/dm.(*unitHolderImpl).CheckAndUpdateStatus in goroutine 181392
	github.com/pingcap/tiflow/engine/executor/dm/unitholder.go:283 +0x105

goroutine 181591 [IO wait, 846 minutes]:
internal/poll.runtime_pollWait(0x7fc6cdcac028, 0x72)
	runtime/netpoll.go:343 +0x85
internal/poll.(*pollDesc).wait(0xc004405100?, 0xc0045d4000?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc004405100, {0xc0045d4000, 0x10000, 0x10000})
	internal/poll/fd_unix.go:164 +0x27a
net.(*netFD).Read(0xc004405100, {0xc0045d4000?, 0x8952080?, 0xc001cd9f38?})
	net/fd_posix.go:55 +0x25
net.(*conn).Read(0xc001630a10, {0xc0045d4000?, 0xc001e92280?, 0x55947f0?})
	net/net.go:179 +0x45
bufio.(*Reader).Read(0xc00458d560, {0xc001138eb8, 0x4, 0x448cfe0?})
	bufio/bufio.go:244 +0x197
io.ReadAtLeast({0x555a500, 0xc00458d560}, {0xc001138eb8, 0x4, 0x4}, 0x4)
	io/io.go:335 +0x90
io.ReadFull(...)
	io/io.go:354
github.com/go-mysql-org/go-mysql/packet.(*Conn).ReadPacketTo(0xc001138e70, {0x555a220?, 0xc024bc6ea0?}, {0x555a500, 0xc00458d560})
	github.com/go-mysql-org/go-mysql@v1.7.1-0.20240314115043-2199dfb0ba98/packet/conn.go:196 +0x6c
github.com/go-mysql-org/go-mysql/packet.(*Conn).ReadPacketReuseMem(0xc001138e70, {0x0, 0x0, 0x0})
	github.com/go-mysql-org/go-mysql@v1.7.1-0.20240314115043-2199dfb0ba98/packet/conn.go:143 +0x4d7
github.com/go-mysql-org/go-mysql/packet.(*Conn).ReadPacket(...)
	github.com/go-mysql-org/go-mysql@v1.7.1-0.20240314115043-2199dfb0ba98/packet/conn.go:99
github.com/go-mysql-org/go-mysql/client.(*Conn).readInitialHandshake(0xc003d9bc80)
	github.com/go-mysql-org/go-mysql@v1.7.1-0.20240314115043-2199dfb0ba98/client/auth.go:35 +0x34
github.com/go-mysql-org/go-mysql/client.(*Conn).handshake(0xc003d9bc80)
	github.com/go-mysql-org/go-mysql@v1.7.1-0.20240314115043-2199dfb0ba98/client/conn.go:135 +0x1c
github.com/go-mysql-org/go-mysql/client.ConnectWithDialer({0x55947f0, 0xc000e764d0}, {0x0, 0x0}, {0xc0018ce480, 0x56}, {0xc003720b10, 0x2}, {0xc003ac4528, 0x10}, ...)
	github.com/go-mysql-org/go-mysql@v1.7.1-0.20240314115043-2199dfb0ba98/client/conn.go:120 +0x586
github.com/go-mysql-org/go-mysql/replication.(*BinlogSyncer).newConnection(0xc004cae700, {0x55946d8?, 0x89883e0?})
	github.com/go-mysql-org/go-mysql@v1.7.1-0.20240314115043-2199dfb0ba98/replication/binlogsyncer.go:899 +0x24a
github.com/go-mysql-org/go-mysql/replication.(*BinlogSyncer).close(0xc004cae700)
	github.com/go-mysql-org/go-mysql@v1.7.1-0.20240314115043-2199dfb0ba98/replication/binlogsyncer.go:231 +0x1b8
github.com/go-mysql-org/go-mysql/replication.(*BinlogSyncer).Close(0xc004cae700)
	github.com/go-mysql-org/go-mysql@v1.7.1-0.20240314115043-2199dfb0ba98/replication/binlogsyncer.go:206 +0x56
github.com/pingcap/tiflow/dm/syncer/binlogstream.(*StreamerController).resetReplicationSyncer(0xc0042e21e0, 0xc00471f890, {{{0xc003726220, 0x1a}, 0x9a}, {0x559f430, 0xc001b24a70}, 0x0})
	github.com/pingcap/tiflow/dm/syncer/binlogstream/streamer_controller.go:260 +0xb1
github.com/pingcap/tiflow/dm/syncer/binlogstream.(*StreamerController).ResetReplicationSyncer(0xc0042e21e0, 0xc00471f890, {{{0xc003726220, 0x1a}, 0x9a}, {0x559f430, 0xc001b24a70}, 0x0})
	github.com/pingcap/tiflow/dm/syncer/binlogstream/streamer_controller.go:242 +0x1e9
github.com/pingcap/tiflow/dm/syncer.(*Syncer).Run(0xc0048e8d80, {0x5594780?, 0xc004a84050?})
	github.com/pingcap/tiflow/dm/syncer/syncer.go:2207 +0x41cd
github.com/pingcap/tiflow/dm/syncer.(*Syncer).Process(0xc0048e8d80, {0x5594780, 0xc004a84000}, 0xc004365fd0?)
	github.com/pingcap/tiflow/dm/syncer/syncer.go:757 +0x325
github.com/pingcap/tiflow/engine/executor/dm.(*unitHolderImpl).Init.func1()
	github.com/pingcap/tiflow/engine/executor/dm/unitholder.go:142 +0x75
created by github.com/pingcap/tiflow/engine/executor/dm.(*unitHolderImpl).Init in goroutine 181392
	github.com/pingcap/tiflow/engine/executor/dm/unitholder.go:140 +0x9d1

What did you expect to see?

No response

What did you see instead?

No response

Versions of the cluster

DM version (run dmctl -V or dm-worker -V or dm-master -V):

master/8.1

Upstream MySQL/MariaDB server version:

AWS Aurora MySQL, type=MySQL] [version=5.7.12-log

Downstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

type=TiDB] [version=7.5.0-20240120-b856499

How did you deploy DM: tiup or manually?

(leave TiUP or manually here)

Other interesting information (system version, hardware config, etc):

>
>

current status of DM cluster (execute query-status <task-name> in dmctl)

(paste current status of DM cluster here)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

1 participant