Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instrumentation handler to collector endpoints #2664

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Nov 28, 2020

Which problem is this PR solving?

Resolves #2264

Currently, we do not expose any HTTP metrics on any of the collector endpoints. This makes it harder to detect when the collector is having issues ingesting bad spans for example.

Short description of the changes

This change adds a metric that reports the duration of each request and adds HTTP status and path labels to it.

Those metrics do not include the collector's gRPC endpoints.

Currently we do not expose any HTTP metrics on any of the collector
endpoints. This makes it harder to detect when the collector
is having issues ingesting bad spans for example. This change adds a
metric that reports the duration of each request and adds
HTTP status and path labels to it.

Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
@dimitarvdimitrov dimitarvdimitrov requested a review from a team as a code owner November 28, 2020 17:38
@mergify mergify bot requested a review from jpkrohling November 28, 2020 17:39
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
@dimitarvdimitrov dimitarvdimitrov marked this pull request as draft November 28, 2020 18:24
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
@codecov
Copy link

codecov bot commented Nov 28, 2020

Codecov Report

Merging #2664 (9fbafb7) into master (221ce20) will increase coverage by 0.46%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2664      +/-   ##
==========================================
+ Coverage   95.07%   95.54%   +0.46%     
==========================================
  Files         213      214       +1     
  Lines        9491     9535      +44     
==========================================
+ Hits         9024     9110      +86     
+ Misses        389      346      -43     
- Partials       78       79       +1     
Impacted Files Coverage Δ
cmd/collector/app/collector.go 74.64% <100.00%> (+0.36%) ⬆️
cmd/collector/app/server/http.go 88.00% <100.00%> (+88.00%) ⬆️
cmd/collector/app/server/zipkin.go 76.92% <100.00%> (+76.92%) ⬆️
pkg/httpmetrics/metrics.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 221ce20...9fbafb7. Read the comment docs.

@dimitarvdimitrov
Copy link
Contributor Author

Should the codecov/patch check be passing? I can't see exactly the reason for it failing, but I'm guessing it's because of the code added in init functions, which aren't tested. Is this a problem?

Also, TestReload looks like a flaky one: for 274c17c it passed on my fork, but failed on this PR,

@dimitarvdimitrov dimitarvdimitrov marked this pull request as ready for review November 28, 2020 19:16
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please try to make sure codecov checks are passing, because we've already fallen to a lever barely above the threshold. I am not sure what you mean by init functions, we tend to write all code as testable (i.e. not using things like package-level init).

TestReload is indeed flaky, we have a ticket open for it.

pkg/instrumentedhandler/metrics.go Outdated Show resolved Hide resolved
pkg/instrumentedhandler/metrics.go Outdated Show resolved Hide resolved
pkg/instrumentedhandler/metrics.go Outdated Show resolved Hide resolved
pkg/instrumentedhandler/metrics_test.go Outdated Show resolved Hide resolved
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
We weren't testing the starting of collector's HTTP and HTTP zipkin
servers. Added sanity tests for that. Also those two handlers
exited with log.Fatal, which made them hard to test; this has been
changed as well

Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Only the prometheus metrics.Factory implementaiton caches the
instantiated Timers, Gauges, etc. The other implementations
allocate new objects each time. We want to prevent that because
this is in the hot path. This adds caching on the request
duration timers. This shouldn't lead to noticeable increase in
memory usage.

Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
It was possible that we get concurrent map read-write panics.

Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, some minor improvements suggested

pkg/httpmetrics/metrics.go Outdated Show resolved Hide resolved
pkg/httpmetrics/metrics_test.go Outdated Show resolved Hide resolved
pkg/httpmetrics/metrics.go Show resolved Hide resolved
cmd/collector/app/server/zipkin_test.go Outdated Show resolved Hide resolved
dimitarvdimitrov and others added 4 commits November 30, 2020 07:35
It's more robust if we use the next available port instead of
hardcoding a large port.

Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
We can use directly struct keys instead of building strings out of each
struct we get.

Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
dimitarvdimitrov and others added 3 commits November 30, 2020 09:28
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Signed-off-by: Dimitar Dimitrov <hi@dimitardimitrov.dev>
Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice. This doesn't (yet) add metrics to any existing endpoint, right?

@dimitarvdimitrov
Copy link
Contributor Author

dimitarvdimitrov commented Nov 30, 2020

Looks nice. This doesn't (yet) add metrics to any existing endpoint, right?

It does - to collector's /api/v1/spans, /api/v2/spans (here) and /api/traces (here) endpoints

)

// test wrong port number
func TestFailToListenHttp(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func TestFailToListenHttp(t *testing.T) {
func TestFailToListenHTTP(t *testing.T) {

assert.EqualError(t, err, "listen tcp: address -1: invalid port")
}

func TestSpanCollectorHttp(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func TestSpanCollectorHttp(t *testing.T) {
func TestSpanCollectorHTTP(t *testing.T) {

cmd/collector/app/server/http_test.go Show resolved Hide resolved
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yurishkuro yurishkuro merged commit 5c3b53b into jaegertracing:master Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose metric for zipkin spans that could not be parsed
3 participants