Skip to content

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented May 13, 2018

#30423 combined auto-expansion in the same cluster state update where nodes are removed. As the auto-expansion step would run before deassociating the dead nodes from the routing table, the auto-expansion would possibly remove replicas from live nodes instead of dead ones. This PR reverses the order to ensure that when nodes leave the cluster that the auto-expand-replica functionality only triggers after failing the shards on the removed nodes. This ensures that active shards on other live nodes are not failed if the primary resided on a now dead node.
Instead, one of the replicas on the live nodes first gets promoted to primary, and the auto-expansion (removing replicas) only triggers in a follow-up step (but still same cluster state update).

Relates to #30456 (comment)
and follow-up of #30423

@ywelsch ywelsch added >bug v7.0.0 :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v6.4.0 labels May 13, 2018
@ywelsch ywelsch requested a review from bleskes May 13, 2018 10:05
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the extra iteration.

@ywelsch ywelsch merged commit d5f028e into elastic:master May 14, 2018
@ywelsch
Copy link
Contributor Author

ywelsch commented May 14, 2018

Thanks @bleskes

ywelsch added a commit that referenced this pull request May 14, 2018
#30423 combined auto-expansion in the same cluster state update where nodes are removed. As
the auto-expansion step would run before deassociating the dead nodes from the routing table, the
auto-expansion would possibly remove replicas from live nodes instead of dead ones. This commit
reverses the order to ensure that when nodes leave the cluster that the auto-expand-replica
functionality only triggers after failing the shards on the removed nodes. This ensures that active
shards on other live nodes are not failed if the primary resided on a now dead node.
Instead, one of the replicas on the live nodes first gets promoted to primary, and the auto-
expansion (removing replicas) only triggers in a follow-up step (but still same cluster state update).

Relates to #30456 and follow-up of #30423
dnhatn added a commit that referenced this pull request May 15, 2018
* 6.x:
  Revert "Silence IndexUpgradeIT test failures. (#30430)"
  [DOCS] Remove references to changelog and to highlights
  Revert "Mute ML upgrade test (#30458)"
  [ML] Fix BWC version for backport of #30125
  [Docs] Improve section detailing translog usage (#30573)
  [Tests] Relax allowed delta in extended_stats aggregation (#30569)
  Fail if reading from closed KeyStoreWrapper (#30394)
  [ML] Reverse engineer Grok patterns from categorization results (#30125)
  Derive max composite buffers from max content len
  Update build file due to doc file rename
  SQL: Extract SQL request and response classes (#30457)
  Remove the changelog (#30593)
  Revert "Add deprecation warning for default shards (#30587)"
  Silence IndexUpgradeIT test failures. (#30430)
  Add deprecation warning for default shards (#30587)
  [DOCS] Adds 6.4.0 release highlight pages
  [DOCS] Adds release highlight pages (#30590)
  Docs: Document how to rebuild analyzers (#30498)
  [DOCS] Fixes title capitalization in security content
  LLRest: Add equals and hashcode tests for Request (#30584)
  [DOCS] Fix realm setting names (#30499)
  [DOCS] Fix path info for various security files (#30502)
  Docs: document precision limitations of geo_bounding_box (#30540)
  Fix non existing javadocs link in RestClientTests
  Auto-expand replicas only after failing nodes (#30553)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants