Skip to content

Trickster memory usage does not decrease after workload stops #901

@harshadparchandtheagilehub

Description

Hello team,
We are observing resource utilization in Trickster pods being stagnant at the increased level even when the Trickster service is not being used. The following details summarize the issue:

Version
Trickster v2.0.0-beta3
Commit: 2025/12/08 - 46c4ca5

Issue Description
We have a dashboard configured to auto-refresh every second. Each refresh request is routed through a Trickster ALB, which forwards the query to approximately 50 Mimir backends and returns around 108 time series (~70 kB of response data). Redis is also configured as the caching backend.

Under this workload, we observe a sustained and continuous increase in CPU and memory usage in the Trickster pods.
When the dashboard auto-refresh is stopped (thereby stopping all queries to Trickster), both CPU and memory usage initially remain high.

After some time, CPU usage drops as expected. However, memory usage does not decrease. Even after leaving the Trickster pods idle for up to 3 days with no incoming queries, memory consumption remains at the same elevated level, which is unexpected given the lack of activity.

Impact

  • Persistently high memory usage in Trickster pods even when no requests are being processed
  • Gradual resource accumulation over time, increasing overall node memory pressure
  • Increased risk of pod instability, including OOM kills and restarts under sustained or repeated load

Below is how the config looks for us

backends:
  backend_1:
    healthcheck:
      expected_codes:
        - 200
        - 401
      failure_threshold: 3
      headers:
        Authorization: Basic password
      host: somehost.com.:12345
      interval: 10000ms
      path: /prometheus/api/v1/status/buildinfo
      recovery_threshold: 3
      scheme: https
      timeout: 10000ms
      verb: GET
    origin_url: https://prometheusURL:12345/prometheus
    prometheus:
      labels:
        region: backend_1
    provider: prometheus
    req_rewriter_name: backend_1_authentication

  ... 50 such backends


  trickster-alb:
    alb:
      healthy_floor: -1
      mechanism: tsm
      pool:
        - backend_1
        - backend_2
        ....
        - backend_50
    provider: alb

caches:
  default:
    provider: redis
    redis:
      client_type: cluster
      endpoints:
        - endpoints_1.com:6379
        - endpoints_2.com:6379
        - endpoints_3.com:6379
        - endpoints_4.com:6379
        - endpoints_5.com:6379
        - endpoints_6.com:6379
        - endpoints_7.com:6379
        - endpoints_8.com:6379
        - endpoints_9.com:6379
      protocol: tcp
      use_tls: true

frontend:
   listen_port: 8480

logging:
  log_level: info

metrics:
  listen_port: 8481

mgmt:
  config_handler_path: /trickster/config
  health_handler_path: /trickster/health
  listen_address: ""
  listen_port: 8484
  ping_handler_path: /trickster/ping
  reload_handler_path: /trickster/config/reload

request_rewriters:
  backend_1_authentication:
    instructions:
      - - header
        - set
        - Authorization
        - Basic password

  ... 50 such backends authentications

Deployment details:

  • We are running 9 pods across 9 nodes, with one pod per node. The underlying node pool has a total capacity of 4 CPUs and 32 GB of memory, distributed across these nodes
  • Each Trickster pod is configured with 100 mCPU and 128 MiB of memory to start with

Dashboard details :

  • The dashboard has just one panel. Time frame is last one hour
  • Approximately 26,000 data points
Image Image Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions