Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(panic): Stop panicking on async task cancellation on shutdown in network and state futures #7219

Merged
merged 13 commits into from
Jul 18, 2023

Conversation

teor2345
Copy link
Collaborator

@teor2345 teor2345 commented Jul 17, 2023

Motivation

We want to stop Zebra panicking on shutdown.

Close #7170

Specifications

Threads:
https://doc.rust-lang.org/std/thread/struct.JoinHandle.html#method.is_finished

Async tasks:
https://docs.rs/tokio/latest/tokio/task/struct.JoinHandle.html#impl-Future-for-JoinHandle%3CT%3E
https://docs.rs/tokio/latest/tokio/task/struct.JoinError.html#method.try_into_panic

Complex Code or Requirements

This PR replaces a bunch of complex panic-handling code with consistent trait methods.

Solution

  • Create generic traits for checking and waiting for panics
  • Implement those traits on OS threads and async tasks
  • Use those trait impls to fix the panic and cancel bugs
  • Replace existing manual implementations with the trait impls

Testing

I've changed enough impls in this PR that any bugs should show up in the tests as they shut Zebra down.

Review

This is a routine bug fix.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
  • How do you know it works? Does it have tests?

Follow Up Work

We can use these methods on JoinError and JoinHandle throughout Zebra, or maybe create our own JoinHandle and JoinError types that automatically use these methods.

@teor2345 teor2345 added C-bug Category: This is a bug P-Medium ⚡ I-panic Zebra panics with an internal error message I-usability Zebra is hard to understand or use A-network Area: Network protocol updates or fixes A-state Area: State / database changes A-concurrency Area: Async code, needs extra work to make it work properly. labels Jul 17, 2023
@teor2345 teor2345 requested a review from a team as a code owner July 17, 2023 03:25
@teor2345 teor2345 self-assigned this Jul 17, 2023
@teor2345 teor2345 requested a review from a team as a code owner July 17, 2023 03:25
@teor2345 teor2345 requested review from upbqdn and removed request for a team July 17, 2023 03:25
arya2
arya2 previously approved these changes Jul 17, 2023
Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you for the simplification!

zebra-chain/src/diagnostic/task/thread.rs Outdated Show resolved Hide resolved
zebra-chain/src/diagnostic/task/thread.rs Outdated Show resolved Hide resolved
zebra-chain/src/diagnostic/task/thread.rs Outdated Show resolved Hide resolved
zebra-chain/src/diagnostic/task/thread.rs Show resolved Hide resolved
zebra-chain/Cargo.toml Show resolved Hide resolved
zebra-state/src/service.rs Show resolved Hide resolved
teor2345 and others added 3 commits July 18, 2023 06:03
Co-authored-by: Arya <aryasolhi@gmail.com>
Co-authored-by: Arya <aryasolhi@gmail.com>
@teor2345 teor2345 requested a review from arya2 July 17, 2023 20:07
mergify bot added a commit that referenced this pull request Jul 18, 2023
@mergify mergify bot merged commit 3bbe3ce into main Jul 18, 2023
287 checks passed
@mergify mergify bot deleted the dont-panic branch July 18, 2023 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Async code, needs extra work to make it work properly. A-network Area: Network protocol updates or fixes A-state Area: State / database changes C-bug Category: This is a bug I-panic Zebra panics with an internal error message I-usability Zebra is hard to understand or use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zebra sometimes panics in the state and network tasks on shutdown
2 participants