Description
System information
Erigon version: v3.0.2
(issue seen on v3.0.0
and v3.0.1
)
OS & Version: Ubuntu 24.04 / Erigon running in Docker through docker-compose
Commit hash: cd2863801089dacef6e6fa807eb02a531a7ab810
Erigon Command (with flags/config):
ETH mainnet w/ archival Caplin docker-compose command:
command:
- --chain=mainnet
- --db.pagesize=4KB
- --db.size.limit=8TB
- --port=31303
- --downloader.disable.ipv6=true
- --http.addr=0.0.0.0
- --http.port=8545
- --http.api=net,web3,eth,admin,debug,txpool,engine,rpc,trace
- --ws
- --trace.maxtraces=2000
- --trace.compat
- --metrics
- --metrics.addr=0.0.0.0
- --metrics.port=6060
- --authrpc.port=8551
- --authrpc.addr=0.0.0.0
- --authrpc.jwtsecret=/opt/jwt/jwt.hex
- --authrpc.vhosts=*
- --http.vhosts=*
- --http.corsdomain=*
- --rpc.batch.concurrency=32
- --db.read.concurrency=512
- --rpc.returndata.limit=5000000000
- --rpc.batch.limit=200
- --torrent.download.slots=6
- --torrent.download.rate=60mb
- --rpc.gascap=5000000000
- --prune.mode=archive
- --caplin.blobs-no-pruning
- --caplin.blobs-immediate-backfill
- --caplin.states-archive
- --caplin.blobs-archive
- --caplin.blocks-archive
- --caplin.discovery.addr=0.0.0.0
- --caplin.discovery.port=51161
- --caplin.discovery.tcpport=51162
- --beacon.api=beacon,builder,config,debug,events,node,validator,lighthouse
- --beacon.api.addr=0.0.0.0
- --beacon.api.cors.allow-origins=*
- --beacon.api.port=5059
Chain/Network: mainnet
& gnosis
Expected behaviour
Erigon won't randomly crash and run as intended
Actual behaviour
Erigon randomly crashes with a long list of goroutine errors. The nodes are fully synced to head.
With a restart flag set in docker-compose Erigon stalls on following auto-restarts as Caplin discovery tcpport hasn't closed. Another restart is needed.
Section of log snippet:
goroutine 12470398 gp=0xc0a6a6f6c0 m=nil [select]:
runtime.gopark(0xc0f92eaf30?, 0x2?, 0xb8?, 0xad?, 0xc0f92eaef4?)
runtime/proc.go:424 +0xce fp=0xc0f92ead70 sp=0xc0f92ead50 pc=0x493aee
runtime.selectgo(0xc0f92eaf30, 0xc0f92eaef0, 0xc27653ffa8?, 0x0, 0xc1a4901e60?, 0x1)
runtime/select.go:335 +0x7a5 fp=0xc0f92eae98 sp=0xc0f92ead70 pc=0x46ee65
github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleSendingMessages(0xc27653ffd0?, {0x354a590, 0x5f5c560}, {0x356ce50, 0xc1a1694c80}, 0xc003770b60)
github.com/libp2p/go-libp2p-pubsub@v0.11.0/comm.go:178 +0x128 fp=0xc0f92eafa0 sp=0xc0f92eae98 pc=0x18652c8
github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleNewPeer.gowrap1()
github.com/libp2p/go-libp2p-pubsub@v0.11.0/comm.go:130 +0x34 fp=0xc0f92eafe0 sp=0xc0f92eafa0 pc=0x1864ed4
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0f92eafe8 sp=0xc0f92eafe0 pc=0x49c1c1
created by github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleNewPeer in goroutine 12489443
github.com/libp2p/go-libp2p-pubsub@v0.11.0/comm.go:130 +0x2e5
goroutine 128769234 gp=0xc0a6cc3dc0 m=nil [select]:
runtime.gopark(0xc0076adf30?, 0x2?, 0xb8?, 0xdd?, 0xc0076adef4?)
runtime/proc.go:424 +0xce fp=0xc0076add70 sp=0xc0076add50 pc=0x493aee
runtime.selectgo(0xc0076adf30, 0xc0076adef0, 0xc0a6cc3dc0?, 0x0, 0x4268be?, 0x1)
runtime/select.go:335 +0x7a5 fp=0xc0076ade98 sp=0xc0076add70 pc=0x46ee65
github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleSendingMessages(0x10000c171399e60?, {0x354a590, 0x5f5c560}, {0x356cee0, 0xc14f5023a0}, 0xc084bb5110)
github.com/libp2p/go-libp2p-pubsub@v0.11.0/comm.go:178 +0x128 fp=0xc0076adfa0 sp=0xc0076ade98 pc=0x18652c8
github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleNewPeer.gowrap1()
github.com/libp2p/go-libp2p-pubsub@v0.11.0/comm.go:130 +0x34 fp=0xc0076adfe0 sp=0xc0076adfa0 pc=0x1864ed4
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0076adfe8 sp=0xc0076adfe0 pc=0x49c1c1
created by github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleNewPeer in goroutine 128766762
github.com/libp2p/go-libp2p-pubsub@v0.11.0/comm.go:130 +0x2e5
goroutine 10914619 gp=0xc0aacf5340 m=nil [sync.Cond.Wait, 466 minutes]:
runtime.gopark(0xc0a4dc7dd8?, 0xe05b62?, 0x90?, 0x7d?, 0xc0a4dc7e18?)
runtime/proc.go:424 +0xce fp=0xc0b774cd98 sp=0xc0b774cd78 pc=0x493aee
runtime.goparkunlock(...)
runtime/proc.go:430
sync.runtime_notifyListWait(0xc168a804f0, 0x21b22)
runtime/sema.go:587 +0x159 fp=0xc0b774cde8 sp=0xc0b774cd98 pc=0x495619
sync.(*Cond).Wait(0xc168a80090?)
sync/cond.go:71 +0x85 fp=0xc0b774ce28 sp=0xc0b774cde8 pc=0x4b5465
github.com/anacrolix/torrent.(*webseedPeer).requester(0xc168a80008, 0x7f)
github.com/anacrolix/torrent@v1.52.6-0.20231201115409-7ea994b6bbd8/webseed-peer.go:337 +0x5e5 fp=0xc0b774cfc0 sp=0xc0b774ce28 pc=0x10a4f65
github.com/anacrolix/torrent.(*webseedPeer).requester.func3.gowrap3()
github.com/anacrolix/torrent@v1.52.6-0.20231201115409-7ea994b6bbd8/webseed-peer.go:267 +0x25 fp=0xc0b774cfe0 sp=0xc0b774cfc0 pc=0x10a5785
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0b774cfe8 sp=0xc0b774cfe0 pc=0x49c1c1
created by github.com/anacrolix/torrent.(*webseedPeer).requester.func3 in goroutine 8203931
github.com/anacrolix/torrent@v1.52.6-0.20231201115409-7ea994b6bbd8/webseed-peer.go:267 +0x1a5
goroutine 8259890 gp=0xc0ad8c2000 m=nil [sync.Cond.Wait, 531 minutes]:
runtime.gopark(0xc12e08bdd8?, 0xe05b62?, 0x90?, 0xbd?, 0xc12e08be18?)
runtime/proc.go:424 +0xce fp=0xc1ac25ad98 sp=0xc1ac25ad78 pc=0x493aee
runtime.goparkunlock(...)
runtime/proc.go:430
sync.runtime_notifyListWait(0xc10d63f670, 0xaa)
runtime/sema.go:587 +0x159 fp=0xc1ac25ade8 sp=0xc1ac25ad98 pc=0x495619
sync.(*Cond).Wait(0xc10d63f210?)
sync/cond.go:71 +0x85 fp=0xc1ac25ae28 sp=0xc1ac25ade8 pc=0x4b5465
github.com/anacrolix/torrent.(*webseedPeer).requester(0xc10d63f188, 0x11)
github.com/anacrolix/torrent@v1.52.6-0.20231201115409-7ea994b6bbd8/webseed-peer.go:337 +0x5e5 fp=0xc1ac25afc0 sp=0xc1ac25ae28 pc=0x10a4f65
github.com/anacrolix/torrent.(*webseedPeer).requester.func3.gowrap3()
github.com/anacrolix/torrent@v1.52.6-0.20231201115409-7ea994b6bbd8/webseed-peer.go:267 +0x25 fp=0xc1ac25afe0 sp=0xc1ac25afc0 pc=0x10a5785
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc1ac25afe8 sp=0xc1ac25afe0 pc=0x49c1c1
created by github.com/anacrolix/torrent.(*webseedPeer).requester.func3 in goroutine 405203
github.com/anacrolix/torrent@v1.52.6-0.20231201115409-7ea994b6bbd8/webseed-peer.go:267 +0x1a5
goroutine 129516757 gp=0xc0b044ddc0 m=nil [select]:
runtime.gopark(0xc2cb10bf40?, 0x2?, 0xf0?, 0x9c?, 0xc2cb10bf2c?)
runtime/proc.go:424 +0xce fp=0xc2cb10bdb8 sp=0xc2cb10bd98 pc=0x493aee
runtime.selectgo(0xc2cb10bf40, 0xc2cb10bf28, 0xc129872cd0?, 0x0, 0xc12e9d6d80?, 0x1)
runtime/select.go:335 +0x7a5 fp=0xc2cb10bee0 sp=0xc2cb10bdb8 pc=0x46ee65
github.com/anacrolix/torrent/tracker/udp.(*Client).requestWriter(0xc1b4df0140, {0x354aad0, 0xc129872cd0}, 0x1, {0xc12e9d6d80, 0x5d, 0x60}, 0x9592f3ee)
github.com/anacrolix/torrent@v1.52.6-0.20231201115409-7ea994b6bbd8/tracker/udp/client.go:177 +0x156 fp=0xc2cb10bf78 sp=0xc2cb10bee0 pc=0xeca396
github.com/anacrolix/torrent/tracker/udp.(*Client).request.func2()
github.com/anacrolix/torrent@v1.52.6-0.20231201115409-7ea994b6bbd8/tracker/udp/client.go:205 +0x3e fp=0xc2cb10bfe0 sp=0xc2cb10bf78 pc=0xecaafe
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc2cb10bfe8 sp=0xc2cb10bfe0 pc=0x49c1c1
created by github.com/anacrolix/torrent/tracker/udp.(*Client).request in goroutine 356414
github.com/anacrolix/torrent@v1.52.6-0.20231201115409-7ea994b6bbd8/tracker/udp/client.go:204 +0x279
In our docker-compose.yml
we have restart: unless-stopped
setup. Erigon restarts but sometimes the restart is too quick and Caplin discovery ports are not closed so Erigon stalls.
[INFO] [04-19|08:38:29.828] Starting caplin
[EROR] [04-19|08:38:30.342] could not start caplin err="failed to listen on any addresses: [listen tcp4 0.0.0.0:51162: bind: address already in use]"
[INFO] [04-19|08:38:30.342] Exiting...
[INFO] [04-19|08:38:30.342] Exiting Engine...
[INFO] [04-19|08:38:30.342] RPC server shutting down
[INFO] [04-19|08:38:30.343] RPC server shutting down
[INFO] [04-19|08:38:30.343] HTTP endpoint closed url=[::]:8649
[INFO] [04-19|08:38:30.343] Engine HTTP endpoint close url=[::]:8551
[INFO] [04-19|08:38:30.343] HTTP endpoint closed url=[::]:8546
[INFO] [04-19|08:38:30.343] RPC server shutting down
[INFO] [04-19|08:38:30.360] [txpool] stopped
[INFO] [04-19|08:38:30.360] devp2p txn pool goroutine terminated
A simple restart again fixes it as the port is closed by then.
Steps to reproduce the behaviour
Just running erigon through docker-compose and it randomly crashes.