feat(silence): add dedicated silencer cache#4980
Draft
siavashs wants to merge 6 commits intoprometheus:mainfrom
Draft
feat(silence): add dedicated silencer cache#4980siavashs wants to merge 6 commits intoprometheus:mainfrom
siavashs wants to merge 6 commits intoprometheus:mainfrom
Conversation
4f36a26 to
0da0d55
Compare
… the tree is always the top index This allows us to have a count for each route, and to be able to move the route indexing from "pointer to the route object" to its integer index. Signed-off-by: Guido Trotter <guido@hudson-trading.com>
…t is only used internally for the recursion) Signed-off-by: Guido Trotter <guido@hudson-trading.com>
- Add a new stopped state - Remove the mtx lock - Change aggrGroupsPerRoute map to a new preallocated slice - Change aggrGroupsNum to an atomic int - Change done chan to a waitgroup - Change state to atomic Add a new type holding the lock at the map fingerprint to aggrgroup level (tbd if we can make this a sync.Map) Preallocate the route slice and its content objects Remove the WaitForLoading call in Groups, which is redundant after LoadingDone. Remove copying the immutable slice in Groups, now only copy the inside maps, holding their individual lock. This also saves copying if there is a route filter, which can be checked without holding a lock. Simplify Stop() to not need a lock by storing the state atomically then calling cancel(). Stop is safe to call more times since cancel() is safe to call more times, and waiting on the finished channel can be done more times, unlike listening on the done chan. In groupAlert only hold the lock for the current route slice. Note that the limit check is done holding only this lock. This is safe now as only one alert can be ingested at a time, but if we enable parallel alert ingestion with N workers, we may overshoot the limit by N. We deem this to be ok as N is smaller than the number of workers, and the limit is a safety to avoid too many AGs, not something that will substantially break between e.g. 1000 and 1016. We will update the documentation about the limit when we will make the ingestion parallel. We could use Add() then check, and then undershoot the limit, or CompareAndSwap and retry but that's making performance worse. Dispatch tests needed fixes to add the Idx in the manually created Route, and to not pass a nil Route. The way the maintenance test populates the aggrgroup also changes with the new system. Signed-off-by: Guido Trotter <guido@hudson-trading.com>
This allows us to avoid taking the lock in most cases. The exception is that we still need it when adding an aggrgroup, as multiple goroutines might be otherwise trying to add theirs, for the same fp. Signed-off-by: Guido Trotter <guido@hudson-trading.com>
We create multiple goroutines on Run(), one for maintenance, one for start timer, and N for ingestion. Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Add a dedicated cache to silencer to drop the dependency on the global marker. For now this should make get API calls non-blocking for silencer. This will also enable us to eventually deprecate and remove the global marker. Signed-off-by: Siavash Safi <siavash@cloudflare.com>
0da0d55 to
56adf7d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a dedicated cache to silencer to drop the dependency on the global marker.
For now this should make get API calls non-blocking for silencer.
This will also enable us to eventually deprecate and remove the global marker.
Signed-off-by: Siavash Safi siavash@cloudflare.com