proposal: Moving Caching part of query-frontend to separate project.

Hi :wave: 

A month ago @tomwilkie [merged a PR](https://github.com/cortexproject/cortex/pull/1441) that makes `query-frontend` capable to cache responses for the queries against any Prometheus API. Details were presented at Prometheus London Meetup:
* Slides: https://speakerdeck.com/grafana/blazin-fast-promql 
* Watch the talk here: https://youtu.be/eyBbImSDOrI

Now, this is amazing piece of work as it allows simple and clear Cortex response caching (with days splitting!) to be used against any Prometheus-based backend. Requests against metric backends are often expensive, have small result output, are simultaneous and repetitive, so it makes sense to treat such caching component as must-have - even for vanilla Prometheus. As the [Thanos](http://thanos.io) maintainers we were looking exactly for something like [this](https://github.com/thanos-io/thanos/issues/1006) for some time. Overall it definitely looks like both Cortex and Thanos are looking to solve a very similar goal. 

**From Thanos side we want to make it a default caching solution that we want to recommend, document and maintain.**

However, still, such caching is heavily bound to Cortex. It has quite a complex Queuing engine that already [was proposed to be extracted from caching](https://github.com/cortexproject/cortex/issues/1150). I believe that splitting caching into a separate project (`promcache`?), in some common org like https://github.com/prometheus-community can have many advantages around contributing, clarity and adoption. I enumerated some benefits further down.

## Proposal

1. Move `query-frontend` caching logic to separate Go module (plus cmd to run it) e.g  https://github.com/prometheus-community/promcache
   * Name of the project is to be defined ( :
2. Add maintainers who want to help from both Cortex and Thanos as the project owners. 
3. Make it clear that this is a caching project for Prometheus API, Cortex, and Thanos backends.
    * Open questions: 
         * What if other backends want something extra? VM, M3DB?
         * Should we embed retries and limits as well? (IMO yes)
4. Allow Cortex to use it either as a library in `query-frontend` or just point to `query-frontend` (without caching)
5. Allow Thanos to use it as a library in Querier (potentially) or spin up on top of Querier (must-have)

If we agree on this, we (Thanos team) are happy to spin this project up, prepare repo, go module, initial docs and extract caching logic from query-frontend. Then we can focus on embedded caching in existing components like Querier or Query-frontend and use `promcache` as a library if needed.

## Benefits of moving caching part of `query-frontend` into a separate project?

* Share responsibility for maintaining `promcache` across both Thanos and Cortex teams.
* More focused project! (caching, retries, limits around Prometheus Query APIs) 
  * Easier to understand, easier collaboration, documentation, starting up
  * Separate versioning
  * Easier to use as a library (fewer deps)
  * Easier to justify adjustments for Cortex & Thanos:
  * While some logic is common, there might some separate changes required for Cortex and Thanos.
    * Cortex: QoS, queueing, multitenancy;
    * Thanos: splitting by different ranges than days when using downsampled data, partial response logic etc
* The first step to join forces and the collaboration between Cortex & Thanos!
  * Space to agree on common queuing API inspired by Cortex that might be useful for Thanos or even vanilla Prometheus
  * Space to agree on multi-tenancy, QoS, retry, limits mechanisms together ❤️ 
* Beneficial for Cortex itself:
   * Scaling caching frontend, separate to the queuing: https://github.com/cortexproject/cortex/issues/1150 

## What could be missing in the current `query-frontend` caching layer?

* Client load balancing for downstream API
   * E.g In Kubernetes it’s hard to equally (round-robin) load balance the Queriers 
* Adjustments for Thanos as mentioned above.
* Caching other Prometheus APIs (label/values, series)
* Other caching backends

[Initial google doc proposal.](https://docs.google.com/document/d/18WqjfPSxsCgmgYwfREYq2TH-5oMP7W0FnCsiPgPG3Ws/edit)

Thanks, @gouthamve for the input so far!

cc @bboreham @tomwilkie and others (: What do you think? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: Moving Caching part of query-frontend to separate project. #1672

Proposal

Benefits of moving caching part of `query-frontend` into a separate project?

What could be missing in the current `query-frontend` caching layer?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: Moving Caching part of query-frontend to separate project. #1672

Description

Proposal

Benefits of moving caching part of query-frontend into a separate project?

What could be missing in the current query-frontend caching layer?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Benefits of moving caching part of `query-frontend` into a separate project?

What could be missing in the current `query-frontend` caching layer?