Vault is inaccessible if an etcd unit is lost

**Describe the bug**
When vault is deployed in a cluster, using etcd for HA management, losing the first etcd unit in the 'address' list results in vault returning 500s 

**To Reproduce**
- Deploy 3 units of vault. In my test case use 1 mysql unit for storage and 3 units of etcd for ha_storage.
- Initialise and unseal vault and validate API with "vault status"
- Shutdown the etcd unit that corresponds to the first IP in the 'address' list (in the 'ha_storage "etcd" ' section)
- Run "vault status" with VAULT_ADDR pointing at any of the vault units

```$ vault status
Error checking leader status: Error making API request.

URL: GET http://10.53.82.200:8200/v1/sys/leader
Code: 500. Errors:

* context deadline exceeded
```

**Expected behavior**
I would expect vault to continue to function and respond to API requests, without interruption, in the event of losing an etcd unit. 

**Environment:**
* Vault Server Version: 0.10.3
* Vault Client Version: 0.10.1
* Server OS: Ubuntu 16.04.5 (Xenial) x86_64

Vault server configuration file(s):

```hcl
api_addr = "http://10.53.82.113:8200"
cluster_addr = "http://10.53.82.113:8201"
disable_mlock = true
storage "mysql" {
  username = "vault"
  password = "kpzLkcjfmX9w45n542zwLyrJBppfg5rP"
  database = "vault"
  address = "10.53.82.226:3306"
}
ha_storage "etcd" {
  ha_enabled = "true"
  address = "https://10.53.82.119:2379,https://10.53.82.150:2379,https://10.53.82.157:2379"
  tls_ca_file = "/var/snap/vault/common/etcd-ca.pem"
  tls_cert_file = "/var/snap/vault/common/etcd-cert.pem"
  tls_key_file = "/var/snap/vault/common/etcd.key"
  etcd_api = "v3"
}
listener "tcp" {
  address = "0.0.0.0:8200"
  tls_disable = 1
}

# Localhost only listener for charm access to vault.
listener "tcp" {
  address = "127.0.0.1:8220"
  tls_disable = 1
}
```

**Additional context**

On the active vault unit the following entries appear in its log when the etcd unit goes down:

```
2018-07-20T08:24:16.614Z [WARN ] core: leadership lost, stopping active operation
2018-07-20T08:24:16.614Z [INFO ] core: pre-seal teardown starting
2018-07-20T08:24:16.614Z [INFO ] core: stopping cluster listeners
2018-07-20T08:24:16.614Z [INFO ] core: shutting down forwarding rpc listeners
2018-07-20T08:24:16.614Z [INFO ] core: forwarding rpc listeners stopped
2018-07-20T08:24:16.907Z [INFO ] core: rpc listeners successfully shut down
2018-07-20T08:24:16.907Z [INFO ] core: cluster listeners successfully shut down
2018-07-20T08:24:16.907Z [INFO ] rollback: stopping rollback manager
2018-07-20T08:24:16.908Z [INFO ] core: pre-seal teardown complete
```

The standby vault units do not log anything as a result of the lost etcd unit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vault is inaccessible if an etcd unit is lost #4961

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vault is inaccessible if an etcd unit is lost #4961

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions