Skip to content

Fix queued snapshot assignments after partial snapshot fails due to delete #88470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

original-brownbear
Copy link
Contributor

We can't just assume that snapshot after snapshot is assigned right,
we must re-compute the right node or whether or not the shard even
exists still.

closes #86724

…elete

We can't just assume that snapshot after snapshot is assigned right,
we must re-compute the right node or whether or not the shard even
exists still.

closes elastic#86724
@original-brownbear original-brownbear added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.4.0 labels Jul 12, 2022
@original-brownbear original-brownbear marked this pull request as ready for review July 12, 2022 13:02
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 12, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @original-brownbear, I've created a changelog YAML for you.

@original-brownbear
Copy link
Contributor Author

Jenkins run elasticsearch-ci/part-2

final SnapshotsInProgress.ShardSnapshotStatus shardSnapshotStatus = startedSnapshot.shards().get(routingShardId);
assertThat(shardSnapshotStatus.state(), is(SnapshotsInProgress.ShardState.INIT));
assertThat(shardSnapshotStatus.nodeId(), is(dataNodeId));
assertThat(shardSnapshotStatus.state(), is(SnapshotsInProgress.ShardState.MISSING));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was broken before, the shard isn't assigned so it must not move to INIT
=> since there's no other shards the snapshot must complete right away as well.

updatedState.generation(),
entry.shardId(repoShardId)
);
startShardSnapshot(repoShardId, updatedState.generation());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big change here really, just extracted the code that re-computes where to run the snapshot since we don't need to do the isQueued check twice and used it here.

@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:05
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Contributor Author

Thanks Tanguy!

@original-brownbear original-brownbear merged commit 0e8f5e4 into elastic:main Jul 27, 2022
@original-brownbear original-brownbear deleted the 86724-one-more-time branch July 27, 2022 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SnapshotStressTestsIT testRandomActivities failing
4 participants