Skip to content

Add allocate_all_primaries to cluster reroute #4285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Nov 27, 2013

From the docs:
allocate_all_primaries::
Allocate all unallocated primaries to any node that can take them.
Accepts no parameters. Each allocation is similar to running allocate
with allow_primary so this can cause data loss. This is useful in the
same cases as allocate with allow_primary but doesn't require looking
up the index or shard or guessing an appropriate node.

Closes #4206

@nik9000
Copy link
Member Author

nik9000 commented Nov 27, 2013

I've confirmed this works using the local gateway:

  1. Start two nodes
  2. Execute:
curl -XDELETE "http://localhost:9200/test?pretty" -s
curl -XPOST "http://localhost:9200/test?pretty" -s -d '{
  "settings": {
    "index": {
      "number_of_shards": 5,
      "number_of_replicas": 0
    }
  }
}'
for i in {1..100}; do 
  curl -XPOST "http://localhost:9200/test/test?pretty" -d '{"foo": "1"}' -s
done
  1. Shut down the node at localhost:9201. Wait for a few seconds.
  2. Execute the below and notice the timeouts. ctrl-c it when you are bored.
for i in {1..100}; do 
  curl -XPOST "http://localhost:9200/test/test?pretty" -d '{"foo": "1"}' -s
done
  1. Execute this:
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
  "commands" : [
    {
      "allocate_all_primaries" : {}
    }
  ]
}'
  1. Now this will work without timeouts:
for i in {1..100}; do 
  curl -XPOST "http://localhost:9200/test/test?pretty" -d '{"foo": "1"}' -s
done

The data is lost but at least you don't have timeouts.

Github's markup is making a mess of this....

@kimchy
Copy link
Member

kimchy commented Dec 7, 2013

It would be interested to check somehow if the primary allocation is just being throttled from being allocated to a node, and in which case, not force the allocation.... . This will require to shard knowledge somehow with LocalGatewayAllocator (in case of local gateway, somehow, we need to take into account the gateway abstraction, maybe have a method that will give the node for a primary shard, and then check the decider on it).

@nik9000
Copy link
Member Author

nik9000 commented Dec 9, 2013

@kimchy, I understand what you are saying but I'm not sure how I'd go about it. It does make me think of something else: will this force allocation and ignore throttling? Is that OK if we're allocating thousands of shards?

I can have a look at implementing what you mention sometime in the next few days.

@kimchy
Copy link
Member

kimchy commented Dec 9, 2013

@nik9000 this force allocation will not end up ignoring throttling, it will just come back to being allocated and respect throttling.

@nik9000
Copy link
Member Author

nik9000 commented Dec 9, 2013

That, at least, is great news. I can imagine folks in a disaster repeatedly trying this over and over again which won't help. I'll make sure that it refuses to do anything if all the unallocated primaries are throttled. I'll see about spitting out a different error message in that case so people know that all shards are in the process of being allocated.

@nik9000
Copy link
Member Author

nik9000 commented Dec 10, 2013

(in case of local gateway, somehow, we need to take into account the gateway abstraction, maybe have a method that will give the node for a primary shard, and then check the decider on it).

So I had a look at this and I'm not really sure how to do this because the decision about which node to assign the shard comes after allocation commands are run. I wonder if it'd be simpler to store the list of throttled shards in the cluster state and dig it back out again during the allocation command....

@kimchy
Copy link
Member

kimchy commented Dec 10, 2013

To be honest, I don't have a good idea about how to do it yet as well :), I will try and spend some time thinking about it and provide feedback soonish (sorry!).

@nik9000
Copy link
Member Author

nik9000 commented Dec 10, 2013

I thought I could get this from the AllocationExplanation on ClusterState but that always seems to be empty. I actually can't find any code that sets it.

From the docs:
`allocate_all_primaries`::
    Allocate all unallocated primaries to any node that can take them.
    Accepts no parameters.  Each allocation is similar to running `allocate`
    with `allow_primary` so this can cause data loss.  This is useful in the
    same cases as `allocate` with `allow_primary` but doesn't require looking
    up the `index` or `shard` or guessing an appropriate `node`.

Closes elastic#4206
@nik9000
Copy link
Member Author

nik9000 commented Dec 10, 2013

Pushed revised version - doesn't do what @kimchy wanted yet but is a bit nicer any way.

@nik9000
Copy link
Member Author

nik9000 commented Mar 5, 2014

I haven't looked at this in a long while. I imagine this would still be useful but don't have much time to think about it recently. Any interest in me resurrecting this?

@manologarciagarcia
Copy link

I have exactly this problem, I have just one shard and sometimes when I restart and look at the health of my cluster, I get this for one of my indexes:

http://pastebin.com/Tq08vep1

I know that if I delete the index, the problem will go away, but that's not the optimal solution.

Is there a solution for this problem? Are this changes here a solution for my problem?

Thanks

@d1nsh
Copy link

d1nsh commented Jun 5, 2014

Any plans of merging this? We run into issues with "unassigned shards" occasionally and it would be great to have a feature like this.

@martijnvg
Copy link
Member

@nik9000 Is this still on your radar? I think this new allocation command is useful.

Just thinking out loud here about how to detect if a node is throttling the primary shard allocation:

  1. The LocalGatewayAllocator#buildShardStates() logic can be moved to a public helper class, on top of this there can be an additional method that just returns the DiscoveryNode that has the highest shard version.
  2. Then in AllocateAllPrimariesAllocationCommand#execute() there can be somewhat of the following logic:
boolean found = false;
for (MutableShardRouting routing : allocation.routingNodes().unassigned()) {
    DiscoveryNode nodeHoldingHigestShardVersion = newHelper.findNodeWithHighestShardVersion();
    Decision decision = Decision.YES;
    if (nodeHoldingHigestShardVersion != null) {
        RoutingNode routingNode = allocation.routingNodes().node(nodeHoldingHigestShardVersion.id());
        decision = allocation.deciders().canAllocate(routing, routingNode, allocation);
    }
    if (decision.type() != Decision.Type.THROTTLE && routing.primary()) {
        found = true;
        // Just clear the post allocation flag to the shard so it'll assign itself.
        allocation.routingNodes().addClearPostAllocationFlag(routing.shardId());
    }
}

if (!found) {
    throw new ElasticsearchIllegalArgumentException("[allocate_all_primaries] no unassigned primaries");
}

This way throttled primary allocation will not be affected by the new command.

@nik9000
Copy link
Member Author

nik9000 commented Sep 4, 2014

This has sunk pretty low on my radar. So low I haven't actually been checking the status and the ping must have slipped by me. I can pick it up at some point but if you want it quickly maybe you can grab it? If my code is a good starting point you can have it. Or start over - I won't be offended - the pull request is really stale.

@s1monw
Copy link
Contributor

s1monw commented Sep 5, 2014

@nik9000 I labeled it accordingly such that it won't get forgotten and will be picked up at some point thanks for pinging again.

@clintongormley
Copy link
Contributor

The pain in allocating many primary shards is finding a place to put them, so suggestion:

  • remove the allow_primary flag from allocation
  • add an allocate_primary action where:
  • node is optional - if not specified then it chooses the node automatically

@martijnvg
Copy link
Member

+1 This plan looks good.

On 10 October 2014 11:48, Clinton Gormley notifications@github.com wrote:

The pain in allocating many primary shards is finding a place to put them,
so suggestion:

  • remove the allow_primary flag from allocation
  • add an allocate_primary action where:
  • node is optional - if not specified then it chooses the node
    automatically


Reply to this email directly or view it on GitHub
#4285 (comment)
.

Met vriendelijke groet,

Martijn van Groningen

@joestump
Copy link

@nik9000 please allow me to buy you a 🍺 or ☕ next time you're in Portland, OR. Great little improvement to ES right here. 👍

@soundofjw
Copy link

+1 this would still be great ;)

@damm
Copy link

damm commented Jul 21, 2015

+1 really needed.

@clintongormley
Copy link
Contributor

@soundofjw @damm what version of Elasticsearch are you using? I asked our support team just a few days ago if they still think that this functionality would be useful. Their response was that, with recent versions, the need for this has pretty much disappeared.

@damm
Copy link

damm commented Jul 23, 2015

@clintongormley I'm using 1.7.0; I still have issues where I break out the bash scripts in this pull request. Single node recently; but a month ago on a cluster actually.

Not common but it happens enough that I don't forget it.

@soundofjw
Copy link

@clintongormley Pretty much same - 1.7.0 as well. There are few times that we need to do this, but it usually happens when setting up a cluster for the first time, or making big changes.

@damm
Copy link

damm commented Jul 23, 2015

+1 to making big changes; I had to break this out when I had a cluster that was not allocating based on available space and it was making one node run out of space.

Had to re-route a bunch of data quickly while waiting for Elasticsearch to balance itself out once there was enough free space.

@clintongormley
Copy link
Contributor

@soundofjw why would you need this when setting up a cluster for the first time, or making big changes? The only time you should need this is when you lose ALL copies of many shards (primaries and replicas) - and you want to force allocation of new empty shard copies.

@ofir-petrushka
Copy link

@clintongormley It's like a bricked phone with no factory reset button. (no new index creation, no inserts, no fix button...)
When you put up a new cluster you might not have all the settings right yet, and might reset all nodes at once and/or have no replica set... (also data copy is delayed a lot by default and moves slow..)

For example installing the nodes with a deployment system (ex. chef, puppet, andsible..), you might deploy to all nodes at once since you don't care yet about down times etc. somehow it reaches such a state..

I had that multiply times doing a new cluster setup (redeploying nodes again and again) + after a few hours of work, not sure why.

It should just be a loop of existing commands...

@soundofjw
Copy link

@clintongormley +1 to what @ofir-petrushka and @damm are saying.

One issue I've seen more than once is when the cluster resets state due to all masters resetting - and then data nodes come and recover shards which are no longer recognized.

You'll see a lot of "# of documents mismatch" in this case.

@clintongormley
Copy link
Contributor

@soundofjw

One issue I've seen more than once is when the cluster resets state due to all masters resetting - and then data nodes come and recover shards which are no longer recognized.

This issue should be fixed in 2.0 with #9952

@soundofjw
Copy link

@clintongormley Awesome! That's great news 👍

@damm
Copy link

damm commented Nov 28, 2015

@clintongormley just hit this with 2.1 :/

@clintongormley
Copy link
Contributor

@damm you want to be more specific?

@damm
Copy link

damm commented Nov 28, 2015

@clintongormley had to reroute all my primary shards after a failed 2.1 upgrade from 2.0

Had to modify the scripts to make it happy.

@clintongormley
Copy link
Contributor

@damm I'm much more interested in why the 2.1 upgrade failed for you. Was it something wrong with 2.1 or something that you did? If the former, please open a separate issue explaining the problem.

@martijnvg martijnvg removed their assignment Jan 21, 2016
@clintongormley
Copy link
Contributor

I'm going to close this PR as it is way out of date, and I think that the use for it is now infrequent.

@lcawl lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >feature help wanted adoptme
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster reroute api should have a way to assign all unassigned shards