Description
There was a few interesting thread about error management for etcd's non-key space operations.
- etcd Go & Java client SDK's retry mechanism may break
Serializable
etcd-io/etcd#18424 (comment) - kubeadm's etcd client member add / remove can return errors but server side there could be success kubernetes/kubeadm#3111
As a first reaction, I think in KCP we are generally ok, because errors reported by etcd are usually handled by re-entracy, which implies we re-assess the current state of the world before deciding the course of action.
But this is also a good chance to audit the code base for when we use non-key space operations, mostly remove member and forward leadership.
NOTE: add member/join is a slight different case, because we rely on kubeadm for it.
PS. I classified this as a bug because I did know exactly which kind to use 😅, but to be clear we are not aware of bugs it this area and this issue is to double check our codebase is robust enough to handle edge cases described in the comment above.
Activity