Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul snap restore on a new cluster resulted in "cannot mount under existing mount" during post-unseal #10420

Open
szahri opened this issue Jun 17, 2021 · 0 comments
Labels
theme/consul-vault Relating to Consul & Vault interactions theme/kv Issues related to the key value store type/bug Feature does not function as expected

Comments

@szahri
Copy link

szahri commented Jun 17, 2021

Overview of the Issue

A new consul cluster (v 0.8.1) was built to replace current Consul cluster (v 0.8.1). Took the consul snap backup from current cluster and restored it onto the new cluster. During the post-unseal process, the following was observed :

[DEBUG]	core:	shutting down leader elections			
[DEBUG]	core:	finished triggering standbyStopCh for runStandby			
[DEBUG]	core:	runStandby done			
[DEBUG]	core:	sealing barrier			
[INFO]	core:	vault is sealed			
[INFO]	core:	vault is unsealed			
[INFO]	core:	entering standby mode			
[INFO]	core:	acquired lock, enabling active operation			
[DEBUG]	core:	generating cluster private key			
[DEBUG]	core:	generating local cluster certificate			
[INFO]	core:	post-unseal setup starting			
[DEBUG]	core:	clearing forwarding clients			
[DEBUG]	core:	done clearing forwarding clients			
[INFO]	core:	loaded wrapping token key			
[INFO]	core:	successfully setup plugin catalog: plugin-directory=			
[INFO]	core:	successfully mounted backend: type=generic path=secret/			
[INFO]	core:	successfully mounted backend: type=system path=sys/			
[INFO]	core:	successfully mounted backend: type=identity path=identi	ty/		
[INFO]	core:	successfully mounted backend: type=pki path=pki/			
[INFO]	core:	successfully mounted backend: type=cubbyhole path=cubby	hole/		
[INFO]	core:	successfully mounted backend: type=pki path=my/ldap/pki/		
[ERROR]	core:	failed to mount entry: path=my/ldap/pki/ error="cannot	mount under existing mount "my/ldap/pki/""		
[INFO]	core:	pre-seal teardown starting			
[INFO]	core:	cluster listeners not running			
[INFO]	core:	pre-seal teardown complete		

It then goes into a loop and remains in status 429 (standby) :

$ consul operator raft list-peers
Node                               ID                 Address            State     Voter
my-consul-node1  x.x.x.x:8300  x.x.x.x:8300  leader    true
my-consul-node2   x.x.x.x:8300   x.x.x.x:8300   follower  true
my-consul-node3   x.x.x.x:8300  x.x.x.x:8300  follower  true

$ vault status
Key                    Value
---                    -----
Seal Type              shamir
Initialized            true
Sealed                 false
Total Shares           1
Threshold              1
Version                1.0.3
Cluster Name           vault-cluster-624c7ebf
Cluster ID             8e2ac018-3d9e-1e84-e4da-76146b54bf83
HA Enabled             true
HA Cluster             n/a
HA Mode                standby
Active Node Address    <none>

The current cluster does not show a duplication of mounts :

$ vault secrets list
Path             Type         Accessor              Description
----             ----         --------              -----------
cubbyhole/       cubbyhole    cubbyhole_295119ab    per-token private secret storage
identity/        identity     identity_275d4e5e     identity store
pki/             pki          pki_e71d366f          n/a
secret/          generic      generic_2c3dc747      generic secret storage
my/ldap/pki/    pki          pki_97dff7b7          n/a
my/ldap2/pki/     pki          pki_fd1cbf54          n/a
sys/             system       system_71e16012       system endpoints used for control, policy and debugging

A side note : I also tried building a new cluster with the latest consul version (1.7.2) and restoring the snapshot with the exact same steps as described below, and I received exactly the same issue. I am doing this activity in preparation of upgrading our current Vault/Consul infrastructure (which is very old!) to the latest versions.

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create a new cluster with the same version of consul (0.8.1) and vault (1.0.3)
  2. Initialize and unseal new cluster with the new keys
  3. Get a copy of consul snap backup from current cluster (consul.snap.2021-06-15_1200), onto the new cluster
  4. Run a snapshot restore:
    consul snapshot restore consul.snap.2021-06-15_1200
  5. Delete core lock
    consul kv delete vault/core/lock
  6. Unseal the vault with the current cluster's master key:
    vault operator unseal
  7. Watch the log as the mounting operation fails

Consul info for both Client and Server

agent:
	check_monitors = 0
	check_ttls = 1
	checks = 1
	services = 2
build:
	prerelease = 
	revision = 'e9ca44d
	version = 0.8.1
consul:
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = x.x.x.x:8300
	server = true
raft:
	applied_index = 20661759
	commit_index = 20661759
	fsm_pending = 0
	last_contact = 5.359637ms
	last_log_index = 20661760
	last_log_term = 2
	last_snapshot_index = 20658044
	last_snapshot_term = 2
	latest_configuration = [{Suffrage:Voter ID:x.x.x.x:8300 Address:x.x.x.x:8300} {Suffrage:Voter ID:x.x.x.x:8300 Address:x.x.x.x:8300} {Suffrage:Voter ID:x.x.x.x:8300 Address:x.x.x.x:8300}]
	latest_configuration_index = 1
	num_peers = 2
	protocol_version = 2
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 2
runtime:
	arch = amd64
	cpu_count = 2
	goroutines = 72
	max_procs = 2
	os = linux
	version = go1.8.1
serf_lan:
	encrypted = true
	event_queue = 0
	event_time = 2
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 3
	members = 3
	query_queue = 0
	query_time = 2
serf_wan:
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 8
	members = 3
	query_queue = 0
	query_time = 2

Operating system and Environment details

RHEL 8

@jsosulska jsosulska added theme/consul-vault Relating to Consul & Vault interactions theme/kv Issues related to the key value store type/bug Feature does not function as expected labels Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/consul-vault Relating to Consul & Vault interactions theme/kv Issues related to the key value store type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

2 participants