Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist: set a timeout on compaction requests #15260

Merged
merged 2 commits into from
Oct 18, 2022

Conversation

pH14
Copy link
Contributor

@pH14 pH14 commented Oct 7, 2022

I'll fill in more details on Monday, but it seems like we're seeing compaction requests occasionally start but never complete. This causes compaction for a shard to jam indefinitely, leading to a domino chain of greater problems, like environmentd crash looping.

Inferring from what data points we do have, but lacking hard evidence, my best guess right now is some S3 calls got stuck indefinitely. We don't currently set any client timeouts on our calls, so it seems possible that a call could be made to an unreachable destination and never complete. We could add timeouts directly to the S3 client, but smithy-lang/smithy-rs#1740 is upcoming and completely overhauls the timeout structure, and seems much improved over what we could do on today's SDK version.

This PR puts a hard timeout on compaction, so that requests over 10 minutes will be dropped. This should prevent us from getting stuck indefinitely on a long request, at the risk of dropping/spinning on requests that really do need 10 minutes. In practice, the mean compaction time is under a second, but, just want to highlight the potential risk here.

Motivation

Tips for reviewer

Checklist

Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo a couple commits

src/persist-client/src/internal/compact.rs Outdated Show resolved Hide resolved
src/persist-client/src/internal/compact.rs Outdated Show resolved Hide resolved
@pH14 pH14 force-pushed the persist-compaction-timeout branch from 2d411cc to a84d9ca Compare October 17, 2022 21:57
@pH14 pH14 marked this pull request as ready for review October 17, 2022 21:59
@pH14 pH14 enabled auto-merge (squash) October 18, 2022 12:28
@pH14 pH14 merged commit b6b6b9f into MaterializeInc:main Oct 18, 2022
@pH14 pH14 deleted the persist-compaction-timeout branch October 18, 2022 12:34
sploiselle pushed a commit to sploiselle/materialize that referenced this pull request Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants