-
Notifications
You must be signed in to change notification settings - Fork 184
Description
Hello team,
We are observing resource utilization in Trickster pods being stagnant at the increased level even when the Trickster service is not being used. The following details summarize the issue:
Version
Trickster v2.0.0-beta3
Commit: 2025/12/08 - 46c4ca5
Issue Description
We have a dashboard configured to auto-refresh every second. Each refresh request is routed through a Trickster ALB, which forwards the query to approximately 50 Mimir backends and returns around 108 time series (~70 kB of response data). Redis is also configured as the caching backend.
Under this workload, we observe a sustained and continuous increase in CPU and memory usage in the Trickster pods.
When the dashboard auto-refresh is stopped (thereby stopping all queries to Trickster), both CPU and memory usage initially remain high.
After some time, CPU usage drops as expected. However, memory usage does not decrease. Even after leaving the Trickster pods idle for up to 3 days with no incoming queries, memory consumption remains at the same elevated level, which is unexpected given the lack of activity.
Impact
- Persistently high memory usage in Trickster pods even when no requests are being processed
- Gradual resource accumulation over time, increasing overall node memory pressure
- Increased risk of pod instability, including OOM kills and restarts under sustained or repeated load
Below is how the config looks for us
backends:
backend_1:
healthcheck:
expected_codes:
- 200
- 401
failure_threshold: 3
headers:
Authorization: Basic password
host: somehost.com.:12345
interval: 10000ms
path: /prometheus/api/v1/status/buildinfo
recovery_threshold: 3
scheme: https
timeout: 10000ms
verb: GET
origin_url: https://prometheusURL:12345/prometheus
prometheus:
labels:
region: backend_1
provider: prometheus
req_rewriter_name: backend_1_authentication
... 50 such backends
trickster-alb:
alb:
healthy_floor: -1
mechanism: tsm
pool:
- backend_1
- backend_2
....
- backend_50
provider: alb
caches:
default:
provider: redis
redis:
client_type: cluster
endpoints:
- endpoints_1.com:6379
- endpoints_2.com:6379
- endpoints_3.com:6379
- endpoints_4.com:6379
- endpoints_5.com:6379
- endpoints_6.com:6379
- endpoints_7.com:6379
- endpoints_8.com:6379
- endpoints_9.com:6379
protocol: tcp
use_tls: true
frontend:
listen_port: 8480
logging:
log_level: info
metrics:
listen_port: 8481
mgmt:
config_handler_path: /trickster/config
health_handler_path: /trickster/health
listen_address: ""
listen_port: 8484
ping_handler_path: /trickster/ping
reload_handler_path: /trickster/config/reload
request_rewriters:
backend_1_authentication:
instructions:
- - header
- set
- Authorization
- Basic password
... 50 such backends authentications
Deployment details:
- We are running 9 pods across 9 nodes, with one pod per node. The underlying node pool has a total capacity of 4 CPUs and 32 GB of memory, distributed across these nodes
- Each Trickster pod is configured with 100 mCPU and 128 MiB of memory to start with
Dashboard details :
- The dashboard has just one panel. Time frame is last one hour
- Approximately 26,000 data points
