Satellite node offline, how to get it back

I have a test cluster running Talos with three nodes, one control plane and two workers. It worked well for almost a year, but stopped scheduling a few days ago because the satellite running on the control plane (gillespie) went dark, showing offline with linstor node list

```
+------------------------------------------------------------+
| Node      | NodeType  | Addresses                | State   |
|============================================================|
| bell      | SATELLITE | 10.244.1.11:3366 (PLAIN) | Online  |
| coryell   | SATELLITE | 10.244.0.20:3366 (PLAIN) | Online  |
| gillespie | SATELLITE | 10.244.2.56:3366 (PLAIN) | OFFLINE |
+------------------------------------------------------------+
```

The linstor-satellite, ha-controller and related pods are running only on the other two nodes. So I'm trying to figure out why the pods aren't starting on gillespie. No linstor-related pods are even stuck in a pending state on gillespie -- it's like they aren't even being created and trying to start.

The pattern of errors in the linstor error -log seem to just reflect the missing satellites:

```
┊ 695E880A-DA3D0-000006 ┊ 2026-01-07 17:04:40 ┊ S|coryell ┊ ResourceException: Failed to adjust DRBD resource pvc-7b575203-37a8-4508-b7d... ┊
┊ 695E880A-DA3D0-000007 ┊ 2026-01-07 17:04:40 ┊ S|coryell ┊ ResourceException: Failed to adjust DRBD resource pvc-e2bf2169-f142-493a-951... ┊
┊ 695E868F-7908B-000016 ┊ 2026-01-07 17:04:55 ┊ S|bell    ┊ ResourceException: Failed to adjust DRBD resource pvc-7b575203-37a8-4508-b7d... ┊
┊ 695E868F-7908B-000017 ┊ 2026-01-07 17:04:55 ┊ S|bell    ┊ ResourceException: Failed to adjust DRBD resource pvc-e2bf2169-f142-493a-951... ┊
```

I went through my notes from when I originally configured the operator on this cluster, and I didn't need to do anything special to create the linstor nodes for each server -- the nodes appear to have spun up automatically after I installed the operator.

When I inspect the controller logs, I see that the TaskScheduleService is continually establishing connections with gillespie and performing some actions similar to the following.

```
linstor-controller 2026-01-07 17:16:05.673 [TaskScheduleService] INFO  LINSTOR/Controller/02cb29 SYSTEM - Establishing connection to node 'gillespie' via /10.244.2.56:3366 ...
linstor-controller 2026-01-07 17:16:13.587 [grizzly-http-server-2] INFO  LINSTOR/Controller/b942f9 SYSTEM - REST/API RestClient(10.244.1.15; 'piraeus-operator/v2.10.3-4e0a21b886ff440c5cfea6760a0f83fd4daa0d47')/LstStorPool
linstor-controller 2026-01-07 17:16:13.597 [grizzly-http-server-0] INFO  LINSTOR/Controller/d61874 SYSTEM - REST/API RestClient(10.244.1.15; 'piraeus-operator/v2.10.3-4e0a21b886ff440c5cfea6760a0f83fd4daa0d47')/LstVlm
linstor-controller 2026-01-07 17:16:13.607 [grizzly-http-server-1] INFO  LINSTOR/Controller/589db6 SYSTEM - REST/API RestClient(10.244.1.15; 'piraeus-operator/v2.10.3-4e0a21b886ff440c5cfea6760a0f83fd4daa0d47')/LstSnapshotDfn
```

Nothing looks like an error.

This is a test environment that I could just rebuild, but I want to figure out how to recover from this type of situation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Satellite node offline, how to get it back #935

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Satellite node offline, how to get it back #935

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions