-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instrumentation handler to collector endpoints #2664
Add instrumentation handler to collector endpoints #2664
Conversation
Currently we do not expose any HTTP metrics on any of the collector endpoints. This makes it harder to detect when the collector is having issues ingesting bad spans for example. This change adds a metric that reports the duration of each request and adds HTTP status and path labels to it. Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Codecov Report
@@ Coverage Diff @@
## master #2664 +/- ##
==========================================
+ Coverage 95.07% 95.54% +0.46%
==========================================
Files 213 214 +1
Lines 9491 9535 +44
==========================================
+ Hits 9024 9110 +86
+ Misses 389 346 -43
- Partials 78 79 +1
Continue to review full report at Codecov.
|
Should the codecov/patch check be passing? I can't see exactly the reason for it failing, but I'm guessing it's because of the code added in init functions, which aren't tested. Is this a problem? Also, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please try to make sure codecov checks are passing, because we've already fallen to a lever barely above the threshold. I am not sure what you mean by init functions, we tend to write all code as testable (i.e. not using things like package-level init).
TestReload is indeed flaky, we have a ticket open for it.
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
We weren't testing the starting of collector's HTTP and HTTP zipkin servers. Added sanity tests for that. Also those two handlers exited with log.Fatal, which made them hard to test; this has been changed as well Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Only the prometheus metrics.Factory implementaiton caches the instantiated Timers, Gauges, etc. The other implementations allocate new objects each time. We want to prevent that because this is in the hot path. This adds caching on the request duration timers. This shouldn't lead to noticeable increase in memory usage. Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
It was possible that we get concurrent map read-write panics. Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, some minor improvements suggested
It's more robust if we use the next available port instead of hardcoding a large port. Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
We can use directly struct keys instead of building strings out of each struct we get. Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice. This doesn't (yet) add metrics to any existing endpoint, right?
) | ||
|
||
// test wrong port number | ||
func TestFailToListenHttp(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func TestFailToListenHttp(t *testing.T) { | |
func TestFailToListenHTTP(t *testing.T) { |
assert.EqualError(t, err, "listen tcp: address -1: invalid port") | ||
} | ||
|
||
func TestSpanCollectorHttp(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func TestSpanCollectorHttp(t *testing.T) { | |
func TestSpanCollectorHTTP(t *testing.T) { |
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Which problem is this PR solving?
Resolves #2264
Currently, we do not expose any HTTP metrics on any of the collector endpoints. This makes it harder to detect when the collector is having issues ingesting bad spans for example.
Short description of the changes
This change adds a metric that reports the duration of each request and adds HTTP status and path labels to it.
Those metrics do not include the collector's gRPC endpoints.