Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly reduce retry duration of service discovery #1541

Merged
merged 1 commit into from
May 22, 2024

Conversation

jackkleeman
Copy link
Contributor

The current 66s is quite long and means you don't see your (generally permanent) error quickly enough

The current 66s is quite long and means you don't see your (generally
permanent) error quickly enough
@jackkleeman jackkleeman requested a review from tillrohrmann May 21, 2024 16:45
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @jackkleeman. I like this PR more than #1540 since 66s is probably a bit too long. The one thing we should make sure is that the exponential retry policy can work with the all deployments (in terms of response time) we want Restate to work with (maybe the expected response time * 2 being smaller than the max pause between the second to last and last attempt or so). Do you have an idea how long it can take a cold lambda to respond?

Comment on lines +61 to +62
// Total duration roughly 1s
let retry_policy = RetryPolicy::exponential(Duration::from_millis(100), 2.0, Some(4), None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does it take to spin up a cold Lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this duration is unrelated to the timeout on the request to the lambda - its just the duration between retries. in the lambda case the first request will block on the cold start, and then likely succeed. if it somehow fails transiently, an immediate subsequent retry will most likely not see a cold start, and then succeed immediately. in no scenario would a super slow cold start lead us to breach this retry policy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it is the duration between failures. Thanks for the clarification.

@jackkleeman jackkleeman requested a review from tillrohrmann May 22, 2024 08:02
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 for merging.

@jackkleeman jackkleeman merged commit e8b09c3 into restatedev:main May 22, 2024
4 checks passed
@jackkleeman jackkleeman deleted the register-noretry branch May 22, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants