Skip to content

Make recovery source partially non-blocking #37291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 12, 2019

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jan 10, 2019

Today a peer-recovery may run into a deadlock if the value of
node_concurrent_recoveries is too high. This happens because the
peer-recovery is executed in a blocking fashion. This commit attempts to
make the recovery source partially non-blocking. I will make three
follow-ups to make it fully non-blocking: (1) send translog operations,
(2) primary relocation, (3) send commit files.

Relates #36195

Today a peer-recovery may run into a deadlock if the value of
`node_concurrent_recoveries` is too high. This happens because the
peer-recovery is executed in a blocking fashion. This commit attempts to
make the recovery source partially non-blocking. I will make three
follow-ups to make it fully non-blocking: (1) send translog operations,
(2) primary relocation, (3) send commit files.
@dnhatn dnhatn added >enhancement :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.7.0 labels Jan 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn dnhatn changed the title Partially make recovery source non-blocking Make recovery source partially non-blocking Jan 10, 2019
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first pass. I would love to minimize the steps we make async in this PR even further.

@dnhatn
Copy link
Member Author

dnhatn commented Jan 10, 2019

@s1monw Thanks for looking. I've minimized changes in this PR - just try to provide the infra for the next steps. Would you please take another look?

@dnhatn dnhatn requested a review from s1monw January 10, 2019 23:26
@dnhatn
Copy link
Member Author

dnhatn commented Jan 11, 2019

@elasticmachine run gradle build tests 1

dnhatn added a commit that referenced this pull request Jan 11, 2019
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took another round

@dnhatn
Copy link
Member Author

dnhatn commented Jan 11, 2019

@s1monw I pushed changes. Can you have another look?

@dnhatn dnhatn requested a review from s1monw January 11, 2019 17:59
dnhatn added a commit that referenced this pull request Jan 11, 2019
This commit introduces StepListener which provides a simple way to write
a flow consisting of multiple asynchronous steps without having nested
callbacks.

Relates #37291
dnhatn added a commit that referenced this pull request Jan 12, 2019
dnhatn added a commit that referenced this pull request Jan 12, 2019
This commit introduces StepListener which provides a simple way to write
a flow consisting of multiple asynchronous steps without having nested
callbacks.

Relates #37291
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM left 2 comments

@dnhatn
Copy link
Member Author

dnhatn commented Jan 12, 2019

Thanks @s1monw.

@dnhatn dnhatn merged commit 44a1071 into elastic:master Jan 12, 2019
@dnhatn dnhatn deleted the non-blocking-recovery branch January 12, 2019 17:49
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 12, 2019
dnhatn added a commit that referenced this pull request Jan 13, 2019
Today a peer-recovery may run into a deadlock if the value of
node_concurrent_recoveries is too high. This happens because the
peer-recovery is executed in a blocking fashion. This commit attempts
to make the recovery source partially non-blocking. I will make three
follow-ups to make it fully non-blocking: (1) send translog operations,
(2) primary relocation, (3) send commit files.

Relates #36195
dnhatn added a commit that referenced this pull request Jan 14, 2019
dnhatn added a commit that referenced this pull request Jan 15, 2019
dnhatn added a commit that referenced this pull request Jan 15, 2019
This commit prepares the required infra to make send a translog snapshot
of the recovery source non-blocking. I'll make a follow-up to make the send
snapshot method non-blocking.

Relates #37291
dnhatn added a commit that referenced this pull request Jan 23, 2019
This commit prepares the required infra to make send a translog snapshot
of the recovery source non-blocking. I'll make a follow-up to make the send
snapshot method non-blocking.

Relates #37291
kovrus added a commit to crate/crate that referenced this pull request Sep 11, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 11, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019
mergify bot pushed a commit to crate/crate that referenced this pull request Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v6.7.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants