Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design doc: No init startup for store gateway. #1813

Closed
bwplotka opened this issue Nov 28, 2019 · 13 comments
Closed

Design doc: No init startup for store gateway. #1813

bwplotka opened this issue Nov 28, 2019 · 13 comments

Comments

@bwplotka
Copy link
Member

Hi 👋

Sharing the design doc with some initial discussion around removing or limiting startup time for store gateway to be used potentially by Cortex:

  • No startup metadata for blocks synchronization.
  • Loading blocks on-demand on query time.

While this has many benefits it is a tradeoff e.g in query latency for "cold blocks": https://docs.google.com/document/d/1En0Hr1OqZLlsF-_JtpYSWEu2mBXyVYx7BvXoivW0n3U/edit#

For Thanos, I am personally super interested in the latter step: loading blocks on-demand on query time. For block meta files synchronization we can go quite far with compaction and current iterating over the bucket. Those two can be tackled separately as well.

Feedback is welcome!

@daixiang0
Copy link
Member

better to add label like design or plan or something...

@stale
Copy link

stale bot commented Feb 5, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Feb 5, 2020
@pracucci
Copy link
Contributor

pracucci commented Feb 5, 2020

There's still some interest from my side on this improvement.

@stale stale bot removed the stale label Feb 5, 2020
@bwplotka
Copy link
Member Author

bwplotka commented Feb 5, 2020

Agree - we should explore that further. We have stale bot exactly for that reason (: To revisit the issue if we are still interested at least once a month.

@stale
Copy link

stale bot commented Mar 6, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Mar 6, 2020
@pracucci
Copy link
Contributor

pracucci commented Mar 6, 2020

Let's keep it alive for a bit more.

Food for thought: the current lack of store gateway HA with sharding (if 1 gateway goes down, all queries fail) may actually be solved with a lazy storage which would allow to a fast re-sharding across gateways without downtime (if we completely remove the initial sync delay).

@stale stale bot removed the stale label Mar 6, 2020
@stale
Copy link

stale bot commented Apr 5, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Apr 5, 2020
@bwplotka
Copy link
Member Author

bwplotka commented Apr 5, 2020 via email

@stale stale bot removed the stale label Apr 5, 2020
@stale
Copy link

stale bot commented May 5, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label May 5, 2020
@bwplotka bwplotka removed the stale label May 5, 2020
@stale
Copy link

stale bot commented Jun 4, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 4, 2020
@pracucci
Copy link
Contributor

pracucci commented Jun 5, 2020

I initially wrote that design doc but, from my side, the interest decreased since then. The reason is that loading cold blocks on-demand would have a significant performance impact on queries hitting blocks not-yet-loaded and the great work done by @bwplotka to reduce the in-memory index header size as well as the blocks sharding introducing in the Cortex store-gateway relaxed this need.

I still believe we need a faster way to "scan the bucket" to discover new/deleted blocks which doesn't involve having every single query/gateway instance running a periodic full bucket scan, but it's a different topic. Also, the recent support for metadata caching on memcached (including bucket List operation) relaxed this need too.

@stale stale bot removed the stale label Jun 5, 2020
@stale
Copy link

stale bot commented Jul 5, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jul 5, 2020
@stale
Copy link

stale bot commented Jul 12, 2020

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jul 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants