Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly reduce retry duration of service discovery #1541

Merged
merged 1 commit into from
May 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 2 additions & 7 deletions crates/node/src/roles/admin.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,8 @@ impl AdminRole {
) -> Result<Self, AdminRoleBuildError> {
let config = updateable_config.pinned();

// Total duration roughly 66 seconds
let retry_policy = RetryPolicy::exponential(
Duration::from_millis(100),
2.0,
Some(10),
Some(Duration::from_secs(20)),
);
// Total duration roughly 1s
let retry_policy = RetryPolicy::exponential(Duration::from_millis(100), 2.0, Some(4), None);
Comment on lines +61 to +62
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does it take to spin up a cold Lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this duration is unrelated to the timeout on the request to the lambda - its just the duration between retries. in the lambda case the first request will block on the cold start, and then likely succeed. if it somehow fails transiently, an immediate subsequent retry will most likely not see a cold start, and then succeed immediately. in no scenario would a super slow cold start lead us to breach this retry policy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it is the duration between failures. Thanks for the clarification.

let client =
ServiceClient::from_options(&config.common.service_client, AssumeRoleCacheMode::None)?;
let service_discovery = ServiceDiscovery::new(retry_policy, client);
Expand Down
Loading