Skip to content

Inconsistent data in an etcd cluster #9630

Closed
@lbernail

Description

@lbernail

We have a 3 node etcd cluster that we used as a backend for a kubernetes cluster and on one of the nodes the data is inconsistent with the others:

Member list

etcdctl member list
76c74df0105143e4, started, etcd1, https://172.30.171.85:2380, https://172.30.171.85:2379
b4a97ffa7975df71, started, etcd2, https://172.30.173.252:2380, https://172.30.173.252:2379
bba515b5b42ffb5c, started, etcd0, https://172.30.167.81:2380, https://172.30.167.81:2379

Status

etcdctl endpoint status
https://172.30.167.81:2379, bba515b5b42ffb5c, 3.2.18+git, 1.2 GB, false, 2, 3115003
https://172.30.171.85:2379, 76c74df0105143e4, 3.2.18+git, 1.2 GB, true, 2, 3115003
https://172.30.173.252:2379, b4a97ffa7975df71, 3.2.18+git, 851 MB, false, 2, 3115003

Data inconsistency
OK Node

etcdctl --endpoints https://172.30.167.81:2379 get --prefix --keys-only /registry/deployments/datadog/datadog-agent-kube-state-metrics --consistency="l"
/registry/deployments/datadog/datadog-agent-kube-state-metrics

Inconsistent Node: key is missing

etcdctl --endpoints https://172.30.173.252:2379 get --prefix --keys-only /registry/deployments/datadog/datadog-agent-kube-state-metrics --consistency="l"

Possible cause
We manage our cluster with terraform and we upgraded it. The upgrade involved replacing the etcd instances but we kept the data and wal directories (on EBS drives on AWS) and the new nodes had the same IP as the initial ones and the same etcd version. However etcd was probably not cleanly shut down.

etcd version: We were using a custom build from the 3.2 branch because 3.2.19 had not been released yet and we needed this PR: #9570
Our etcd was built from this commit: https://github.com/roboll/etcd/commit/d45053c068950a5672a22d1192249313dbcbca26 with go 1.10 (binary available here: https://github.com/roboll/etcd/releases/tag/v3.2.19-datadog). Even if this is not an official release we believe that this should not have happened.

We are keeping the cluster in this state to be able to diagnose what happened. We are happy to send more details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions