Skip to content

proposal: Moving Caching part of query-frontend to separate project. #1672

Closed
@bwplotka

Description

@bwplotka

Hi 👋

A month ago @tomwilkie merged a PR that makes query-frontend capable to cache responses for the queries against any Prometheus API. Details were presented at Prometheus London Meetup:

Now, this is amazing piece of work as it allows simple and clear Cortex response caching (with days splitting!) to be used against any Prometheus-based backend. Requests against metric backends are often expensive, have small result output, are simultaneous and repetitive, so it makes sense to treat such caching component as must-have - even for vanilla Prometheus. As the Thanos maintainers we were looking exactly for something like this for some time. Overall it definitely looks like both Cortex and Thanos are looking to solve a very similar goal.

From Thanos side we want to make it a default caching solution that we want to recommend, document and maintain.

However, still, such caching is heavily bound to Cortex. It has quite a complex Queuing engine that already was proposed to be extracted from caching. I believe that splitting caching into a separate project (promcache?), in some common org like https://github.com/prometheus-community can have many advantages around contributing, clarity and adoption. I enumerated some benefits further down.

Proposal

  1. Move query-frontend caching logic to separate Go module (plus cmd to run it) e.g https://github.com/prometheus-community/promcache
    • Name of the project is to be defined ( :
  2. Add maintainers who want to help from both Cortex and Thanos as the project owners.
  3. Make it clear that this is a caching project for Prometheus API, Cortex, and Thanos backends.
    • Open questions:
      • What if other backends want something extra? VM, M3DB?
      • Should we embed retries and limits as well? (IMO yes)
  4. Allow Cortex to use it either as a library in query-frontend or just point to query-frontend (without caching)
  5. Allow Thanos to use it as a library in Querier (potentially) or spin up on top of Querier (must-have)

If we agree on this, we (Thanos team) are happy to spin this project up, prepare repo, go module, initial docs and extract caching logic from query-frontend. Then we can focus on embedded caching in existing components like Querier or Query-frontend and use promcache as a library if needed.

Benefits of moving caching part of query-frontend into a separate project?

  • Share responsibility for maintaining promcache across both Thanos and Cortex teams.
  • More focused project! (caching, retries, limits around Prometheus Query APIs)
    • Easier to understand, easier collaboration, documentation, starting up
    • Separate versioning
    • Easier to use as a library (fewer deps)
    • Easier to justify adjustments for Cortex & Thanos:
    • While some logic is common, there might some separate changes required for Cortex and Thanos.
      • Cortex: QoS, queueing, multitenancy;
      • Thanos: splitting by different ranges than days when using downsampled data, partial response logic etc
  • The first step to join forces and the collaboration between Cortex & Thanos!
    • Space to agree on common queuing API inspired by Cortex that might be useful for Thanos or even vanilla Prometheus
    • Space to agree on multi-tenancy, QoS, retry, limits mechanisms together ❤️
  • Beneficial for Cortex itself:

What could be missing in the current query-frontend caching layer?

  • Client load balancing for downstream API
    • E.g In Kubernetes it’s hard to equally (round-robin) load balance the Queriers
  • Adjustments for Thanos as mentioned above.
  • Caching other Prometheus APIs (label/values, series)
  • Other caching backends

Initial google doc proposal.

Thanks, @gouthamve for the input so far!

cc @bboreham @tomwilkie and others (: What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    keepaliveSkipped by stale bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions