Skip to content

feat(silence): add dedicated silencer cache#4980

Draft
siavashs wants to merge 6 commits intoprometheus:mainfrom
siavashs:feat/dedicated-silencer-cache
Draft

feat(silence): add dedicated silencer cache#4980
siavashs wants to merge 6 commits intoprometheus:mainfrom
siavashs:feat/dedicated-silencer-cache

Conversation

@siavashs
Copy link
Contributor

@siavashs siavashs commented Feb 8, 2026

Add a dedicated cache to silencer to drop the dependency on the global marker.
For now this should make get API calls non-blocking for silencer.
This will also enable us to eventually deprecate and remove the global marker.

Signed-off-by: Siavash Safi siavash@cloudflare.com

@siavashs siavashs force-pushed the feat/dedicated-silencer-cache branch 2 times, most recently from 4f36a26 to 0da0d55 Compare February 8, 2026 22:12
Guido Trotter and others added 6 commits February 9, 2026 13:15
… the tree is always the top index

This allows us to have a count for each route, and to be able to move
the route indexing from "pointer to the route object" to its integer
index.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
…t is only used internally for the recursion)

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
- Add a new stopped state
- Remove the mtx lock
- Change aggrGroupsPerRoute map to a new preallocated slice
- Change aggrGroupsNum to an atomic int
- Change done chan to a waitgroup
- Change state to atomic

Add a new type holding the lock at the map fingerprint to aggrgroup
level (tbd if we can make this a sync.Map)

Preallocate the route slice and its content objects

Remove the WaitForLoading call in Groups, which is redundant after
LoadingDone.

Remove copying the immutable slice in Groups, now only copy the inside
maps, holding their individual lock. This also saves copying if there is
a route filter, which can be checked without holding a lock.

Simplify Stop() to not need a lock by storing the state atomically then
calling cancel(). Stop is safe to call more times since cancel() is safe
to call more times, and waiting on the finished channel can be done more
times, unlike listening on the done chan.

In groupAlert only hold the lock for the current route slice. Note that
the limit check is done holding only this lock. This is safe now as only
one alert can be ingested at a time, but if we enable parallel alert
ingestion with N workers, we may overshoot the limit by N. We deem this
to be ok as N is smaller than the number of workers, and the limit is a
safety to avoid too many AGs, not something that will substantially
break between e.g. 1000 and 1016. We will update the documentation about
the limit when we will make the ingestion parallel. We could use Add()
then check, and then undershoot the limit, or CompareAndSwap and retry
but that's making performance worse.

Dispatch tests needed fixes to add the Idx in the manually created
Route, and to not pass a nil Route. The way the maintenance test
populates the aggrgroup also changes with the new system.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
This allows us to avoid taking the lock in most cases.

The exception is that we still need it when adding an aggrgroup, as
multiple goroutines might be otherwise trying to add theirs, for the
same fp.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
We create multiple goroutines on Run(), one for maintenance, one for
start timer, and N for ingestion.

Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Add a dedicated cache to silencer to drop the dependency on the global marker.
For now this should make get API calls non-blocking for silencer.
This will also enable us to eventually deprecate and remove the global marker.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
@siavashs siavashs force-pushed the feat/dedicated-silencer-cache branch from 0da0d55 to 56adf7d Compare February 9, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant