Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost leader after restarting chef-backend #1473

Open
maciu10 opened this issue Feb 15, 2018 · 3 comments
Open

Lost leader after restarting chef-backend #1473

maciu10 opened this issue Feb 15, 2018 · 3 comments
Labels
Aspect: Stability Consistent results. Component: chef-backend Triage: Try Reproducing Indicates that this issue needs to be reproduced. Type: Bug Does not work as expected.

Comments

@maciu10
Copy link

maciu10 commented Feb 15, 2018

Using a chef-backend cluster with 3 backend nodes - backend1, backend2 and backend3. backend1 is the current leader. When I stop chef-backend (chef-backendctl stop) on all three machines, the start it up again the cluster has no leader and never recovers.

Expected Behavior

Leader to be re-elected after restarting chef-backend on all three backend machines.

Current Behavior

Leader is never re-elected after starting the services and leaderl logs show:

2018-02-15_17:03:14.91161 [I] <0.715.0> leader_elector:init => leader_elector is starting in state: initializing
2018-02-15_17:03:14.91226 [I] <0.716.0> status_updater:init => status_updater started
2018-02-15_17:03:14.97625 [I] <0.563.0> no_mod:no_fun => Application leaderl started on node 'leaderl@127.0.0.1'
2018-02-15_17:03:14.97720 [I] <0.563.0> no_mod:no_fun => Application eper started on node 'leaderl@127.0.0.1'
2018-02-15_17:03:14.97785 [I] <0.563.0> no_mod:no_fun => Application recon started on node 'leaderl@127.0.0.1'
2018-02-15_17:03:15.09875 [I] <0.730.0> key_watcher:handle_info => Initial start of watcher on behalf of leader_block_state for "cb/control/blocked_leaders"
2018-02-15_17:03:15.09967 [I] <0.729.0> leader_elector:do_connect => Connecting as node backend1 (10.40.10.56,da5d0b3758d410e434be81a1453b4c24)
2018-02-15_17:03:15.09997 [I] <0.729.0> leader_elector:do_connect => Leader no_leader, Boot no_bootstrap_node
2018-02-15_17:03:15.10002 [I] <0.729.0> leader_elector:do_connect => No leader in place.
2018-02-15_17:03:15.10048 [I] <0.729.0> leader_elector:do_connect => I am NOT a bootstrap node because bootstrap_key returned no_bootstrap_node so I'll wait for a leader.

cluster-status shows:

Name            IP           GUID                              Role                PG        ES
backend2  10.40.10.57  59540d34fa370a28ed3098cc78b2a245  waiting_for_leader  follower  not_master
backend1  10.40.10.56  da5d0b3758d410e434be81a1453b4c24  waiting_for_leader  leader    not_master
backend3  10.40.10.55  8c4ae52bf8a7b77ee65b638f7159199a  waiting_for_leader  follower  master

Steps to Reproduce (for bugs)

  1. Starting with backend1, then backend2 followed by backend3 "chef-backendctl stop"
  2. Bring back chef in the reverse order - staring with backend3, then backend2 followed by backend3 "chef-backendctl start"

Your Environment

  • Chef Server Version: chef-backend 2.0.1
  • Total/free RAM and disk space: Total RAM 8GB / free RAM 5.2 GB
  • Operating System and Version: ubuntu 14.04.5 LTS
  • If upgrading, previous Chef Server version: N/A
  • Running in a container? no
@maciu10 maciu10 changed the title Lost leader after stopping chef-backend Lost leader after restating chef-backend Feb 15, 2018
@maciu10 maciu10 changed the title Lost leader after restating chef-backend Lost leader after restarting chef-backend Feb 15, 2018
@stevendanna
Copy link
Contributor

@maciu10 If your cluster is still down please contact chef-support for more immediate assistance. this looks like a case in which using the force-leader command on backend-1 should resolve the issue, but support can help if there is any doubt.

@maciu10
Copy link
Author

maciu10 commented Feb 21, 2018

@stevendanna thanks for the response. I was able to bring back the leader right away using force-leader. I created the defect thinking that this should not happen (aka leader should come up online after restart of chef-backend).

@PrajaktaPurohit PrajaktaPurohit added Status: Untriaged An issue that has yet to be triaged. Aspect: Correctness Aspect: Stability Consistent results. Component: chef-backend Type: Bug Does not work as expected. Triage: Try Reproducing Indicates that this issue needs to be reproduced. and removed Status: Untriaged An issue that has yet to be triaged. labels Oct 11, 2019
@PrajaktaPurohit
Copy link
Contributor

Chef-backend is not opened for users to log issues. Keeping this here with label component:chef-backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Aspect: Stability Consistent results. Component: chef-backend Triage: Try Reproducing Indicates that this issue needs to be reproduced. Type: Bug Does not work as expected.
Projects
None yet
Development

No branches or pull requests

4 participants