Skip to content

Crash after migrating instance from one machine to another #4297

Open
@cdecker

Description

@cdecker

This was reported by raucao on IRC. He migrated an existing node from
one host to another, including a switch from Intel Core i7 to AMD
Ryzen (not sure if the arch switch has anything to do with it, just
mentioning it for context).

The following backtrace was provided:

lightning_connectd: FATAL SIGNAL 6 (version v0.9.2)
0x562b3d3b5304 send_backtrace
        common/daemon.c:38
0x562b3d3b539e crashdump
        common/daemon.c:51
0x7f139c1fa20f ???
        ???:0
0x7f139c1fa18b ???
        ???:0
0x7f139c1d9858 ???
        ???:0
0x562b3d3e2e3d call_error
        ccan/ccan/tal/tal.c:93
0x562b3d3e2ed8 check_bounds
        ccan/ccan/tal/tal.c:165
0x562b3d3e2f00 to_tal_hdr
        ccan/ccan/tal/tal.c:174
0x562b3d3e3dfe tal_resize_
        ccan/ccan/tal/tal.c:694
0x562b3d3e2379 do_vfmt
        ccan/ccan/tal/str/str.c:60
0x562b3d3e2613 tal_append_vfmt
        ccan/ccan/tal/str/str.c:102
0x562b3d3e26b6 tal_append_fmt
        ccan/ccan/tal/str/str.c:111
0x562b3d3afade add_errors_to_error_list
        connectd/connectd.c:680
0x562b3d3afb8d destroy_io_conn
        connectd/connectd.c:702
0x562b3d3da84e destroy_conn
        ccan/ccan/io/poll.c:244
0x562b3d3dabb0 cleanup_conn_without_close
        ccan/ccan/io/poll.c:264
0x562b3d3d9559 io_close_taken_fd
        ccan/ccan/io/io.c:463
0x562b3d3af9d3 peer_connected
        connectd/connectd.c:511
0x562b3d3aff16 peer_init_received
        connectd/peer_exchange_initmsg.c:94
0x562b3d3d8fc0 next_plan
        ccan/ccan/io/io.c:59
0x562b3d3d946b do_plan
        ccan/ccan/io/io.c:407
0x562b3d3d9508 io_ready
        ccan/ccan/io/io.c:417
0x562b3d3dae4c io_loop
        ccan/ccan/io/poll.c:445
0x562b3d3afcb0 main
        connectd/connectd.c:1703
0x7f139c1db0b2 ???
        ???:0
0x562b3d3ab59d ???
        ???:0
0xffffffffffffffff ???
        ???:0

From this it appears that &connect->errors is not tal allocated
which should not happen since it is initialized in try_connect_peer:

lightning/connectd/connectd.c

Lines 1526 to 1537 in abad494

connect = tal(daemon, struct connecting);
connect->daemon = daemon;
connect->id = *id;
connect->addrs = tal_steal(connect, addrs);
connect->addrnum = 0;
/* connstate is supposed to be updated as we go, to give context for
* errors which occur. We miss it in a few places; would be nice to
* fix! */
connect->connstate = "Connection establishment";
connect->seconds_waited = seconds_waited;
connect->addrhint = tal_steal(connect, addrhint);
connect->errors = tal_strdup(connect, "");

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions