-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Access] Fixed REST API metrics #4288
[Access] Fixed REST API metrics #4288
Conversation
any idea if this will address the memory leak: #3968 |
Codecov Report
@@ Coverage Diff @@
## master #4288 +/- ##
==========================================
+ Coverage 53.74% 55.41% +1.66%
==========================================
Files 871 783 -88
Lines 80761 70019 -10742
==========================================
- Hits 43408 38801 -4607
+ Misses 33921 28323 -5598
+ Partials 3432 2895 -537
Flags with carried forward coverage won't be shown. Click here to find out more.
|
I have tested making 500k REST requests using multiple clients over prolonged periods of time in different runs and prof hasn't reported memory leaks both in used or claimed objects even though metrics were reported correctly in Prometheus(checked using Grafana) |
@@ -13,14 +13,17 @@ import ( | |||
"github.com/gorilla/mux" | |||
) | |||
|
|||
// we have to use single rest collector for all metrics since it's not allowed to register same | |||
// collector multiple times. | |||
var restCollector = metrics.NewRestCollector(metricsProm.Config{Prefix: "access_rest_api"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should only ever be a single REST server running on a node. Is MetricsMiddleware
getting called multiple times in production?
Rather than using a global, how about we build the collector externally and pass it into MetricsMiddleware()
, or pass in the prefix value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that in production there only be a single REST server running on a node, but the fix we implemented was primarily for tests. During each test, we allocate a new router without a server, which creates a new middleware with a REST controller. However, I discovered that the controller instances were not being unregistered
when the test finished, resulting in multiple instances and the error message "duplicate metrics collector registration attempted." Using restCollector
as a global variable was the simplest solution to this issue without making significant code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it will be a bit more refactoring, but I think we should align this with how metrics are used elsewhere and pass the collector in rather than create a special case.
what this may look like is creating the collector object in the AN's node builder, and passing it through into the middleware, similar to how it works with the logger. In tests that are not explicitly testing metrics, we can use the NoopCollector
which is already setup with stubs for the RestCollector
:
flow-go/module/metrics/noop.go
Lines 242 to 247 in 16c25ea
func (nc *NoopCollector) ObserveHTTPRequestDuration(context.Context, httpmetrics.HTTPReqProperties, time.Duration) { | |
} | |
func (nc *NoopCollector) ObserveHTTPResponseSize(context.Context, httpmetrics.HTTPReqProperties, int64) { | |
} | |
func (nc *NoopCollector) AddInflightRequests(context.Context, httpmetrics.HTTPProperties, int) {} | |
func (nc *NoopCollector) AddTotalRequests(context.Context, string, string) {} |
A pattern we use in the network layer, is to embed sub-collectors in the top level struct and interface:
flow-go/module/metrics/network.go
Lines 23 to 29 in 16c25ea
type NetworkCollector struct { | |
*UnicastManagerMetrics | |
*LibP2PResourceManagerMetrics | |
*GossipSubMetrics | |
*GossipSubScoreMetrics | |
*GossipSubLocalMeshMetrics | |
*AlspMetrics |
that way the single top level collector can be passed through.
The RestCollector could be embedded in the AccessCollector interface and implementation in a similar way.
ah I was just gonna debug this on emulator side, metrics were not collected right? |
You are correct. The fix was added because metrics were not being collected. |
module/metrics/rest_api.go
Outdated
} | ||
|
||
type recorder struct { | ||
type RecorderCollector struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this RestCollector
. the previous "recorder" terminology is a bit confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one naming comment, but otherwise this looks great. Thanks @Guitarheroua!
I think. this one also should be reviewed onflow/flow-emulator#375, as they depends on each other |
return func(next http.Handler) http.Handler { | ||
return http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) { | ||
// This is a custom metric being called on every http request | ||
r.AddTotalRequests(req.Context(), req.Method, req.URL.Path) | ||
restCollector.AddTotalRequests(req.Context(), req.Method, req.URL.Path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Guitarheroua is taking a look, thanks for catching this |
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)). After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue. To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations. Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com> Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com> Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)). After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue. To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations. Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com> Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com> Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)). After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue. To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations. Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com> Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com> Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)). After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue. To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations. Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com> Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com> Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)). After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue. To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations. Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com> Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com> Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
#3971
Context
This PR addresses the issues regarding the metrics on the Access node's REST interface not being scraped correctly by Prometheus, which caused a memory leak.
The metrics have been fixed and now show up in the Prometheus as expected. However, the memory leak described in the original task was not reproducible. Therefore, this PR solely focuses on fixing the metrics issue.