Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Access] Fixed REST API metrics #4288

Merged

Conversation

Guitarheroua
Copy link
Collaborator

#3971

Context

This PR addresses the issues regarding the metrics on the Access node's REST interface not being scraped correctly by Prometheus, which caused a memory leak.

The metrics have been fixed and now show up in the Prometheus as expected. However, the memory leak described in the original task was not reproducible. Therefore, this PR solely focuses on fixing the metrics issue.

@Guitarheroua Guitarheroua changed the title [Access] Fixed metrics for Access node Rest interface. [Access] Fixed REST API metrics Apr 27, 2023
@Guitarheroua Guitarheroua marked this pull request as ready for review April 27, 2023 19:09
@peterargue
Copy link
Contributor

peterargue commented Apr 28, 2023

any idea if this will address the memory leak: #3968

@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2023

Codecov Report

Merging #4288 (83b4abf) into master (68f607a) will increase coverage by 1.66%.
The diff coverage is 20.00%.

@@            Coverage Diff             @@
##           master    #4288      +/-   ##
==========================================
+ Coverage   53.74%   55.41%   +1.66%     
==========================================
  Files         871      783      -88     
  Lines       80761    70019   -10742     
==========================================
- Hits        43408    38801    -4607     
+ Misses      33921    28323    -5598     
+ Partials     3432     2895     -537     
Flag Coverage Δ
unittests 55.41% <20.00%> (+1.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
engine/access/rest/middleware/metrics.go 0.00% <0.00%> (ø)
engine/access/rest/server.go 0.00% <0.00%> (ø)
module/metrics/access.go 0.00% <0.00%> (ø)
module/metrics/rest_api.go 0.00% <0.00%> (ø)
engine/access/rpc/engine.go 59.07% <50.00%> (+0.15%) ⬆️
engine/access/rest/router.go 100.00% <100.00%> (ø)
engine/access/rest/test_helpers.go 87.50% <100.00%> (+0.54%) ⬆️

... and 99 files with indirect coverage changes

@Guitarheroua
Copy link
Collaborator Author

any idea if this will address the memory leak: #3968

I have tested making 500k REST requests using multiple clients over prolonged periods of time in different runs and prof hasn't reported memory leaks both in used or claimed objects even though metrics were reported correctly in Prometheus(checked using Grafana)

@@ -13,14 +13,17 @@ import (
"github.com/gorilla/mux"
)

// we have to use single rest collector for all metrics since it's not allowed to register same
// collector multiple times.
var restCollector = metrics.NewRestCollector(metricsProm.Config{Prefix: "access_rest_api"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should only ever be a single REST server running on a node. Is MetricsMiddleware getting called multiple times in production?

Rather than using a global, how about we build the collector externally and pass it into MetricsMiddleware(), or pass in the prefix value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that in production there only be a single REST server running on a node, but the fix we implemented was primarily for tests. During each test, we allocate a new router without a server, which creates a new middleware with a REST controller. However, I discovered that the controller instances were not being unregistered when the test finished, resulting in multiple instances and the error message "duplicate metrics collector registration attempted." Using restCollector as a global variable was the simplest solution to this issue without making significant code changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it will be a bit more refactoring, but I think we should align this with how metrics are used elsewhere and pass the collector in rather than create a special case.

what this may look like is creating the collector object in the AN's node builder, and passing it through into the middleware, similar to how it works with the logger. In tests that are not explicitly testing metrics, we can use the NoopCollector which is already setup with stubs for the RestCollector:

func (nc *NoopCollector) ObserveHTTPRequestDuration(context.Context, httpmetrics.HTTPReqProperties, time.Duration) {
}
func (nc *NoopCollector) ObserveHTTPResponseSize(context.Context, httpmetrics.HTTPReqProperties, int64) {
}
func (nc *NoopCollector) AddInflightRequests(context.Context, httpmetrics.HTTPProperties, int) {}
func (nc *NoopCollector) AddTotalRequests(context.Context, string, string) {}

A pattern we use in the network layer, is to embed sub-collectors in the top level struct and interface:

type NetworkCollector struct {
*UnicastManagerMetrics
*LibP2PResourceManagerMetrics
*GossipSubMetrics
*GossipSubScoreMetrics
*GossipSubLocalMeshMetrics
*AlspMetrics

that way the single top level collector can be passed through.

The RestCollector could be embedded in the AccessCollector interface and implementation in a similar way.

@bluesign
Copy link
Contributor

bluesign commented May 3, 2023

ah I was just gonna debug this on emulator side, metrics were not collected right?

@Guitarheroua
Copy link
Collaborator Author

Ah I was just gonna debug this on the emulator side, metrics were not collected, right?

You are correct. The fix was added because metrics were not being collected.

}

type recorder struct {
type RecorderCollector struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this RestCollector. the previous "recorder" terminology is a bit confusing

Copy link
Contributor

@peterargue peterargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one naming comment, but otherwise this looks great. Thanks @Guitarheroua!

@Guitarheroua
Copy link
Collaborator Author

Guitarheroua commented May 15, 2023

Added one naming comment, but otherwise, this looks great. Thanks @Guitarheroua!

I think. this one also should be reviewed onflow/flow-emulator#375, as they depends on each other

@peterargue peterargue merged commit b5eee78 into onflow:master May 16, 2023
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
// This is a custom metric being called on every http request
r.AddTotalRequests(req.Context(), req.Method, req.URL.Path)
restCollector.AddTotalRequests(req.Context(), req.Method, req.URL.Path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to create a unique value of handler for every single collection, causing too. much metrics to be collected

image

@franklywatson
Copy link
Contributor

@Guitarheroua is taking a look, thanks for catching this

@Guitarheroua Guitarheroua deleted the guitarheroua/3971-fix-rest-api-metrics branch June 12, 2023 18:27
bors bot added a commit that referenced this pull request Jun 29, 2023
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua

This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)).

After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue.

To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations.



Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com>
Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com>
Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
peterargue added a commit that referenced this pull request Jul 18, 2023
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua

This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)).

After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue.

To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations.

Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com>
Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com>
Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
peterargue added a commit that referenced this pull request Jul 18, 2023
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua

This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)).

After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue.

To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations.

Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com>
Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com>
Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
peterargue added a commit that referenced this pull request Jul 18, 2023
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua

This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)).

After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue.

To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations.

Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com>
Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com>
Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
peterargue added a commit that referenced this pull request Jul 18, 2023
4452: [Metrics] Fix issue with collecting REST Metrics r=peterargue a=Guitarheroua

This fix is implemented in response to [a comment made in PR 4288 ](#4288 (comment)).

After enabling metrics for REST, the system now generates metrics for each HTTP request based on the request method and URL path. However, when a path includes an ID, such as '**/v1/collection/12345/**' or '**/v1/collection/67890/**', a unique value is created for each path, leading to an issue.

To address this problem, I have modified the reporting logic to generate metrics based on the service ID and request method. Currently, the service ID is empty, but there is potential for improvement in future iterations.

Co-authored-by: Andrii Slisarchuk <andriyslisarchuk@gmail.com>
Co-authored-by: Andrii Slisarchuk <Guitarheroua@users.noreply.github.com>
Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants