-
Notifications
You must be signed in to change notification settings - Fork 568
Fix avoidable prometheus metrics cardinality #4080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix avoidable prometheus metrics cardinality #4080
Conversation
|
@g00g1: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.
DetailsI am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository. |
|
@g00g1: There are no area labels on this PR. You can add as many areas as you see fit.
DetailsI am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository. |
|
/kind fix |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #4080 +/- ##
==========================================
- Coverage 62.76% 62.74% -0.02%
==========================================
Files 459 459
Lines 33067 33067
==========================================
- Hits 20755 20749 -6
- Misses 10192 10196 +4
- Partials 2120 2122 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2210033 to
0022dbd
Compare
|
Good catch, thanks! |
I am running decently sized crowdsec setup (more than 100k active decisions simultaneously) with significant proportion of bans added using cscli. Also, my setup requires collection of Prometheus full metrics to collect historical data regarding popularity of some decisions.
All of the above contributes to extremely large size of resulting metrics endpoint response (more than tens of megabytes), as well as RAM overhead (see screenshot 1). After manual inspection I have noticed that the problematic metrics include
cs_lapi_machine_requests_totallike these:I would like to argue usefulness of exposing raw URL in metrics including query parameters like IP addresses specified above.
Instead, I propose to use
func (c *Context) FullPath() stringinstead (available since gin-gonic/gin v1.5.0 so this change could be backported). It returns a matched route full path instead of raw URL as in original request, therefore preventing unnecessary overhead and too high labels cardinality for setups configured to offer full Prometheus metrics level without aggregation (which in fact removes some metrics instead).I expect this to be not a breaking change.
Screenshot 1
Before / after this patch in my setup