Skip to content

Consul upgrade from 1.14.11 to 1.15.10 results in "refusing to rejoin cluster because server has been offline for more than the configured server_rejoin_age_max" error #20722

@radcool

Description

@radcool

Overview of the Issue

I’ve upgraded a Consul server from 1.14.4 → 1.14.11 without issue, but when I then attempt to upgrade to Consul 1.15.10 from 1.14.11, I get the following error message on startup: refusing to rejoin cluster because server has been offline for more than the configured server_rejoin_age_max (168h0m0s) - consider wiping your data dir.

The server has been offline for only a fraction of a second during the systemctl restart consul command, so I’m not sure what the message is referring to here.

Posting this question on discuss.hashicorp.com hasn't resulted in any obvious answer, so perhaps this is a bug?


Reproduction Steps

  1. Create a cluster with three server nodes (this is what I have, but might not matter at all).
  2. Make sure each server is running Consul 1.14.11.
  3. Change the Consul binary of one of the non-leader servers to version 1.15.10.
  4. Restart Consul.
  5. Check the logs for the error message above.

Consul info for Server

Server info

Output from server 'consul info' command here:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease = 
	revision = c0c5688c
	version = 1.14.11
	version_metadata = 
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = 192.168.40.23:8300
	server = true
raft:
	applied_index = 167628074
	commit_index = 167628074
	fsm_pending = 0
	last_contact = 30.184466ms
	last_log_index = 167628074
	last_log_term = 13469
	last_snapshot_index = 167622202
	last_snapshot_term = 13469
	latest_configuration = [{Suffrage:Voter ID:11659f41-183a-8ed2-ed11-5be8e2044ea4 Address:192.168.40.22:8300} {Suffrage:Voter ID:3df4f76a-9bae-f14a-785e-da0903cb5241 Address:192.168.40.23:8300} {Suffrage:Voter ID:ab60d46b-23fc-7fe4-4c34-5677356857b5 Address:192.168.40.21:8300}]
	latest_configuration_index = 0
	num_peers = 2
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 13469
runtime:
	arch = amd64
	cpu_count = 2
	goroutines = 261
	max_procs = 2
	os = linux
	version = go1.20.10
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 108
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 36413
	members = 27
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 2273
	members = 3
	query_queue = 0
	query_time = 1

HCL config file from server:

{
  "node_name": "consul01",
  "bind_addr": "192.168.40.21",
  "client_addr": "127.0.0.1 192.168.40.21",
  "datacenter": "XXXXXXX",
  "server": true,
  "bootstrap_expect": 3,
  "data_dir": "/var/db/consul",
  "encrypt": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "retry_join": [
    "192.168.40.22",
    "192.168.40.23"
  ],
  "tls": {
    "defaults": {
      "key_file": "/etc/consul.d/tls/XXXXXXX-server-consul-0-key.pem",
      "cert_file": "/etc/consul.d/tls/XXXXXXX-server-consul-0.pem",
      "ca_file": "/etc/consul.d/tls/consul-agent-ca.pem",
      "verify_incoming": true,
      "verify_outgoing": true
    },
    "internal_rpc": {
      "verify_server_hostname": true
    }
  },
  "auto_encrypt": {
    "allow_tls": true
  },
  "ports": {
    "https": 8501
  },
  "peering": {
    "enabled": false
  },
  "connect": {
    "enabled": false
  }
}

Operating system and Environment details

Consul servers are running on Red Hat Enterprise Linux Server release 7.6 (Maipo).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions