-
Notifications
You must be signed in to change notification settings - Fork 820
Improve TSDB blocks sync #1991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve TSDB blocks sync #1991
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I've received an error from the linter because the rununtil.Repeat() error wasn't checked, so I took the change to refactor it (moved to UserStore since it belongs to it), renamed cortex_querier_sync_seconds to cortex_querier_blocks_sync_seconds and customized the buckets: default values are too low for this operation in a medium/large cluster. @pstibrany May you re-review it, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Your change made me to look at more code, and one concern I have (but it's not new in this PR) is that when using target=All, initial sync of the querier will block entire startup of the single-binary Cortex. Is that right?
@pstibrany That's correct. On the other end, you can't query Cortex until the initial sync is done (but you could ingest), so not sure what's the best way to handle it. |
One way to handle it would be to reject query requests until querier is finished with the initial sync, but not to block the startup of Cortex. Let's fix it when we get there. |
@pstibrany @thorfour I've re-iterated on this PR based on your feedback (allow to disable sync, do not synch twice at startup, fix race condition) plus a couple of other changes (max concurrency when synching multi stores, track timing for initial sync too). May you re-review it, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for also fixing concurrency issues!
7d8b790
to
a888a16
Compare
@pstibrany Thanks for your review. I've addressed the comments. |
14fc10a
to
57db848
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me thanks!
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…metric to cortex_querier_blocks_sync_seconds Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
57db848
to
175783c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does:
Currently, there's an hardcoded
30s
as blocks sync interval (TSDB storage). It's quite a lot. I believe we don't even need such an high-frequency sync interval (at least, not as default) so I'm proposing to allow to configure it and set an higher default (Thanos' default is3m
, I've set5m
which should be more than enough for the Cortex use case).I've also received an error from the linter because the
rununtil.Repeat()
error wasn't checked, so I took the change to refactor it (moved toUserStore
since it belongs to it), renamedcortex_querier_sync_seconds
tocortex_querier_blocks_sync_seconds
and customized the buckets: default values are too low for this operation in a medium/large cluster.Which issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]