Skip to content

Improve TSDB blocks sync #1991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pracucci
Copy link
Contributor

@pracucci pracucci commented Jan 16, 2020

What this PR does:
Currently, there's an hardcoded 30s as blocks sync interval (TSDB storage). It's quite a lot. I believe we don't even need such an high-frequency sync interval (at least, not as default) so I'm proposing to allow to configure it and set an higher default (Thanos' default is 3m, I've set 5m which should be more than enough for the Cortex use case).

I've also received an error from the linter because the rununtil.Repeat() error wasn't checked, so I took the change to refactor it (moved to UserStore since it belongs to it), renamed cortex_querier_sync_seconds to cortex_querier_blocks_sync_seconds and customized the buckets: default values are too low for this operation in a medium/large cluster.

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pracucci
Copy link
Contributor Author

I've received an error from the linter because the rununtil.Repeat() error wasn't checked, so I took the change to refactor it (moved to UserStore since it belongs to it), renamed cortex_querier_sync_seconds to cortex_querier_blocks_sync_seconds and customized the buckets: default values are too low for this operation in a medium/large cluster.

@pstibrany May you re-review it, please?
@thorfour What's your take?

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Your change made me to look at more code, and one concern I have (but it's not new in this PR) is that when using target=All, initial sync of the querier will block entire startup of the single-binary Cortex. Is that right?

@pracucci
Copy link
Contributor Author

when using target=All, initial sync of the querier will block entire startup of the single-binary Cortex. Is that right?

@pstibrany That's correct. On the other end, you can't query Cortex until the initial sync is done (but you could ingest), so not sure what's the best way to handle it.

@pstibrany
Copy link
Contributor

@pstibrany That's correct. On the other end, you can't query Cortex until the initial sync is done (but you could ingest), so not sure what's the best way to handle it.

One way to handle it would be to reject query requests until querier is finished with the initial sync, but not to block the startup of Cortex. Let's fix it when we get there.

@pracucci pracucci changed the title Allow to configure TSDB blocks sync interval Improve TSDB blocks sync Jan 17, 2020
@pracucci
Copy link
Contributor Author

@pstibrany @thorfour I've re-iterated on this PR based on your feedback (allow to disable sync, do not synch twice at startup, fix race condition) plus a couple of other changes (max concurrency when synching multi stores, track timing for initial sync too). May you re-review it, please?

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for also fixing concurrency issues!

@pracucci pracucci force-pushed the allow-to-configure-sync-period-for-blocks-storage branch from 7d8b790 to a888a16 Compare January 17, 2020 08:04
@pracucci
Copy link
Contributor Author

@pstibrany Thanks for your review. I've addressed the comments.

@pracucci pracucci force-pushed the allow-to-configure-sync-period-for-blocks-storage branch from 14fc10a to 57db848 Compare January 17, 2020 09:59
Copy link
Contributor

@thorfour thorfour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me thanks!

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…metric to cortex_querier_blocks_sync_seconds

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the allow-to-configure-sync-period-for-blocks-storage branch from 57db848 to 175783c Compare January 21, 2020 12:02
Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pracucci pracucci merged commit 5251dcd into cortexproject:master Jan 21, 2020
@pracucci pracucci deleted the allow-to-configure-sync-period-for-blocks-storage branch January 21, 2020 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants