Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Register SendTransactionService exit #31261

Merged
merged 4 commits into from
Apr 21, 2023

Conversation

CriesofCarrots
Copy link
Contributor

@CriesofCarrots CriesofCarrots commented Apr 18, 2023

Problem

The SendTransactionService doesn't receive exit signals from the validator, allowing the threads to continue in zombie mode when the validator has otherwise halted.

Summary of Changes

Extend the JsonRpcService exit block to signal SendTransactionService when the validator exits, by passing in an AtomicBool instead of creating on inside SendTransactionService

Fixes #31079

@CriesofCarrots CriesofCarrots force-pushed the sts-exit branch 2 times, most recently from 47ef9e9 to 9a6af2f Compare April 18, 2023 23:13
@codecov
Copy link

codecov bot commented Apr 19, 2023

Codecov Report

Merging #31261 (c825b6b) into master (04bbf3b) will increase coverage by 0.0%.
The diff coverage is 97.6%.

@@           Coverage Diff           @@
##           master   #31261   +/-   ##
=======================================
  Coverage    81.5%    81.5%           
=======================================
  Files         733      733           
  Lines      206885   206926   +41     
=======================================
+ Hits       168693   168769   +76     
+ Misses      38192    38157   -35     

@CriesofCarrots CriesofCarrots marked this pull request as ready for review April 19, 2023 04:58
@CriesofCarrots CriesofCarrots requested a review from t-nelson April 19, 2023 04:58
@CriesofCarrots
Copy link
Contributor Author

Follow-up work will examine exit and join behavior in JsonRpcService

Comment on lines 834 to 848
);

exit.store(true, Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think exit is going to race the thread spawn here, should probably wait on... something.

similarly need to sit on join() after setting exit to ensure the thread stops as intended.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably wait on... something.

i guess easiest would be to switch to a zero-capacity bounded channel and do a blocking write to it. once thread is up, that message will be consumed and waiter will unblock. at that point it's safe to signal exit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, great idea. Done.

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strategy looks good! i think the test could use a little tightening up though

@CriesofCarrots CriesofCarrots force-pushed the sts-exit branch 2 times, most recently from 94263fa to e3d25a3 Compare April 20, 2023 18:26
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the test improvements!

Comment on lines +852 to +855
let mut option = Ok(());
while option.is_ok() {
option = sender.send(dummy_tx_info());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm guessing we do this 'cause SendTransactionService::join() looks busted and that will be addressed in the alluded to followup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naw, join() looks fine, actually. Ensuring the receiver was dropped was the best thing I thought of to show that updating exit outside of the service actually causes stuff to stop. join() already sets exit directly. Open to better ideas, though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...post-merge ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this was wrong. join() also needs to ensure the rpc Server is closed.

@CriesofCarrots CriesofCarrots merged commit 03c1744 into solana-labs:master Apr 21, 2023
mergify bot pushed a commit that referenced this pull request Apr 21, 2023
* Pass exit into SendTransactionService

* Abort SendTransactionService with BanksService

* Register SendTransactionService exit as part of RpcService validator Exit

* Improve test, ensure receiver has been dropped

(cherry picked from commit 03c1744)

# Conflicts:
#	rpc/src/rpc_service.rs
CriesofCarrots pushed a commit that referenced this pull request Apr 22, 2023
)

* Register SendTransactionService exit (#31261)

* Pass exit into SendTransactionService

* Abort SendTransactionService with BanksService

* Register SendTransactionService exit as part of RpcService validator Exit

* Improve test, ensure receiver has been dropped

(cherry picked from commit 03c1744)

# Conflicts:
#	rpc/src/rpc_service.rs

* Fix conflict

---------

Co-authored-by: Tyera <tyera@solana.com>
@CriesofCarrots CriesofCarrots deleted the sts-exit branch May 25, 2023 22:11
bw-solana pushed a commit to bw-solana/solana that referenced this pull request Jan 10, 2025
…31261) (solana-labs#31292)

* Register SendTransactionService exit (solana-labs#31261)

* Pass exit into SendTransactionService

* Abort SendTransactionService with BanksService

* Register SendTransactionService exit as part of RpcService validator Exit

* Improve test, ensure receiver has been dropped

(cherry picked from commit 03c1744)

# Conflicts:
#	rpc/src/rpc_service.rs

* Fix conflict

---------

Co-authored-by: Tyera <tyera@solana.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fatal error didn't kill the validator process
2 participants