Skip to content

Handle etcd compaction while offline #2136

@solidsnakedev

Description

@solidsnakedev

Context & versions

We have identified an issue related to the etcd file located at /opt/hydra-node/persistence/last-known-revision. When restarting the Hydra node repeatedly, we observed that the file last-known-revision is unexpectedly reset to “0”.

Hydra version
0.22.2-b984abccf65921decc0e36452fb59cd3f478ba7d

Steps to reproduce

Steps to Reproduce the Issue:

Start the Hydra node.

Stop the Hydra node and observe the value in the last-known-revision file (e.g., 6745).

Manually change the last-known-revision value to a lower number (e.g., 3000).

Restart the Hydra node and observe that it is unable to operate. The last-known-revision is reset to 0.

To recover from this state:

Set the last-known-revision back to the original value (e.g., 6745).

Restart the Hydra node.

Run the etcd command:

etcdctl endpoint status --write-out json

This will return the current revision known (in our case, it was 33047).

Update the last-known-revision file with this value.

Restart the Hydra node and verify that it comes back online successfully.

Actual behavior

When restarting the Hydra node repeatedly, we observed that the file last-known-revision is unexpectedly reset to “0”.

Expected behavior

Under normal circumstances, in the event of a node failure, etcd is expected to retrieve the last known revision from the other nodes in the cluster. However, this does not seem to be happening as expected.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

Done ✔

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions