Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sinatra] tag span with request id if it is available in response headers #427

Merged
merged 8 commits into from
Jun 5, 2018

Conversation

pawelchcki
Copy link
Contributor

@pawelchcki pawelchcki commented May 18, 2018

Add header tagging support to Sinatra in order to support tagging request id.

This PR refactors out the implementation of header tagging from Rack and makes it more generic - able to be used in other situations that require tagging based on a list of potential tags.

Related to: #375

@pawelchcki pawelchcki force-pushed the feature/sinatra_add_request_id branch from 8c62952 to 63e6316 Compare May 18, 2018 09:38
@pawelchcki pawelchcki added the feature Involves a product feature label May 18, 2018
@pawelchcki pawelchcki changed the title WIP: Sinatra add request [sinatra] tag span with request id if it is available in response headers May 18, 2018
@pawelchcki pawelchcki added the integrations Involves tracing integrations label May 18, 2018
@pawelchcki pawelchcki requested a review from delner May 18, 2018 13:16
Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the spirit of what's being done here: extract header tagging to a level where both Rack and Sinatra can use it. This will add a lot of value to our integrations moving forward.

However, I'm not sure to what modules these new responsibilities are assigned, or what they're named is quite right yet.

  • The main library probably shouldn't own Rack specific responsibilities.
  • HTTP probably shouldn't be a kind of tagging, and the act of tagging itself probably shouldn't know anything about HTTP. HTTP should probably implement tagging behavior.

Maybe we need some set of layered responsibilities here, like: Tagging < HTTPTagger < RackTagger? Just a very rough idea there.

I think I'd like to see a refactored, alternative design that incorporates the above points, while keeping the general strategy implemented here.

Overall, this is a good start!

@@ -30,6 +30,10 @@ class TracerTestApp < Sinatra::Application
get '/literal-template' do
erb '<%= msg %>', locals: { msg: 'hello' }
end

before do
response.headers['X-Request-ID'] = 'request-id'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given these are integration tests, is it better to force headers here, or to pass them in from rack-test's #get method? This kind of change makes some assumptions about how the Sinatra API works. For test resiliency, it might be beneficial to take a more black-box approach here.


assert_equal(200, last_response.status)

spans = @writer.spans()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could drop the () here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -237,4 +244,28 @@ def test_literal_template
assert_equal(0, root.status)
assert_nil(root.parent)
end

def test_tagging_request_headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test covering all the header options? I think it could use some more assertions around both request and response tags. I might suggest looking at Rack's header tagging test, which might not be perfect, but has some identical behavior its testing for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests for both cases. Which showed that Sinatra's after hook doesn't have all response headers yet, so I had to introduce simple rack Middleware to correctly handle that case.

module Datadog
module Utils
# Helper class to used to tag configured headers
module MassTagger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datadog::Utils::MassTagger is a bit ambiguous. What kind of object is it tagging? What kind of data is it doing en masse?

I'd suggest making the name more specific, or splitting the behavior into smaller modules/files that are more specific. Perhaps something like: Datadog::Utils::Tagging::Rack::Request or Datadog::Utils::Tagging::Rack::Response?

Furthermore... is this the best place for this behavior? I have mixed feelings that something named "Rack" exists in the main library. Rack is a very specific integration, and I think our main library probably should reference a minimal number of external integrations, even if its only doing so by name.

In this sense, my suggestion here would to move this behavior under ddtrace/contrib/rack/utils namespace. Any other integration that requires this behavior, e.g. Rails or Sinatra which specifies Rack as a dependency anyways, could then require 'ddtrace/contrib/rack/utils/tagging/request or require 'ddtrace/contrib/rack/utils/tagging/response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed MassTagger to simply Tagger. As its quite generic and could be used outside of Rack (i.e. GRPC).

Rack specific code now lives under contrib/rack/tagging. Thanks for suggesting it, I wasn't sure if code should be shared among 'contribs'.

module Datadog
module Utils
# Helper class to used to tag configured headers
module MassTagger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add unit tests for this module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this to Tagger. And it now has tests 👍

@@ -32,18 +36,19 @@ module Tracer
option :tracer, default: Datadog.tracer
option :resource_script_names, default: false
option :distributed_tracing, default: false
option :headers, default: Utils::MassTagger::DEFAULT_HEADERS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that both integrations will have the same set of default options. DRY is good!

However, something about the name of the constant it sources from doesn't sit quite right: why does a MassTagger own a list of default headers? Does tagging implicitly include HTTP? Or does HTTP instead implement tagging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved that to RequestSpanMiddleware which I think might be more appropriate

end
end
def set_header_tags!(span, env, headers)
Datadog::Utils::MassTagger.tag(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks cleaner!

@@ -169,6 +159,10 @@ def set_request_tags!(request_span, env, status, headers, response, original_env
end
end

def configuration
Datadog.configuration[:rack]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -980,6 +980,7 @@ Where `options` is an optional `Hash` that accepts the following parameters:
| ``resource_script_names`` | Prepend resource names with script name | ``false`` |
| ``distributed_tracing`` | Enables [distributed tracing](#distributed-tracing) so that this service trace is connected with a trace of another service if tracing headers are received | `false` |
| ``tracer`` | A ``Datadog::Tracer`` instance used to instrument the application. Usually you don't need to set that. | ``Datadog.tracer`` |
| ``headers`` | Hash of HTTP request or response headers to add as tags to the `rack.request`. Accepts `request` and `response` keys with Array values e.g. `['Last-Modified']`. Adds `http.request.headers.*` and `http.response.headers.*` tags respectively. | ``{ response: ['Content-Type', 'X-Request-ID'] }`` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy/paste error here: it doesn't add to the rack.request, should be Sinatra.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@pawelchcki
Copy link
Contributor Author

Thanks for review @delner. I think the code looks better now, and is ready for next review.

@delner delner added this to the 0.13.0 milestone May 22, 2018
@delner
Copy link
Contributor

delner commented May 22, 2018

We chatted a little more about this via Slack. Overall, I think the general direction of refactoring some generic Rack behavior out of the tracer and into modules is a great way to go. This will help make our design a bit easier to use and integrate into other Rack-based contribs.

For now, in the interest of simplifying things, I'd suggest extracting out some of these broader Rack changes into its own refactor PR, so its clearer what is necessary for this Sinatra feature, and what's a general improvement upon other parts of the code. That would allow us to focus on getting the Sinatra portion merged, so we could afterwards turn our attention to improving code sharing between the contribs.

@pawelchcki pawelchcki force-pushed the feature/sinatra_add_request_id branch from f429673 to 5b21ab9 Compare May 23, 2018 11:41
@pawelchcki
Copy link
Contributor Author

I've extracted previous changes to #432. While I've kept changes in this PR to minimum

@pawelchcki pawelchcki force-pushed the feature/sinatra_add_request_id branch from 888a118 to 7ae275e Compare May 23, 2018 13:36
@pawelchcki pawelchcki assigned delner and unassigned pawelchcki May 23, 2018
Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extracting those changes, this is now much simpler and easier to evaluate!

I have a big question regarding the use of middleware, and then some smaller questions regarding implementation/feature details. See below.

@@ -71,6 +76,8 @@ def render(engine, data, *)
end
end

app.use RequestSpanMiddleware
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me likes the idea of adding this middleware, but part of me is a little confused on the boundaries here.

  • What parts of tracing is Datadog::Contrib::Sinatra::Tracer responsible for? And this new middleware?
  • Are these co-dependent?
  • Are they doing things the other can't?

I think I need some clarification on these points so I can understand how this design improves the Sinatra tracer by some measure of coupling, simplicity, and extensibility. I'm concerned we're introducing two co-dependent modules here which increases complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good questions.
I've introduced the RequestSpanMiddleware because its actually impossible to fetch all response headers from app.after hook, as it this filter happens before the response is fully rendered.

As such I think its still worth to have Sinatra specific instrumentation in Tracer as it provides access to Sinatra's specific code and information, which might not be possible from within Rack Middleware.

One thing that could be improved here though is coupling via calling order. I.e. Sinatra::Tracer relies on the span to be created with RequestSpanMiddleware. While in general this is not a problem because Middleware is always executed before. However its still unnecessary coupling I'll address that in next commit

span = tracer.trace('sinatra.request',
service: Datadog.configuration[:sinatra][:service_name],
span_type: Datadog::Ext::HTTP::TYPE)
span = RequestSpanMiddleware.fetch_sinatra_trace(request.env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below theres a @datadog_request_span. Is that obsolete? Should it just be using this middleware to drive span storage/retrieval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I missed your changes when backporting some of these concepts. I've reallign it now to use RequestSpan module to process creation, storage and retrieval of spans.

@@ -4,6 +4,7 @@
class TracerTest < TracerTestBase
class TracerTestApp < Sinatra::Application
get '/request' do
headers['X-Request-Id'] = request.env['HTTP_X_REQUEST_ID']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since current implementation only observes existing headers. So to actually test request_id handling I need to somehow inject the header into response. I'll clarify that in the comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; so this is to emulate a Sinatra app that adds a request ID header to its response? Seems okay to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly.

def call(env)
span = request_span!(env)
_, headers, = @app.call(env)
ensure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this ensure here, and set_error below in tracer.rb. Are we appropriately setting failed requests with errors? And where should this be done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good question. Tracer uses Sinatra facilities to process error, which I think is appropriate in this context.

@pawelchcki pawelchcki force-pushed the feature/sinatra_add_request_id branch from 286e59e to 98af086 Compare May 25, 2018 14:00
@delner delner assigned pawelchcki and unassigned delner May 29, 2018
Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a bug in this design, but I proposed an idea of how we could fix it, while improving on the design here a little further.

Let me know what you think about the proposal; if that sounds good to you, I think it might get us to "approved" status. (Very optimistic we're close on this one!)

env[SINATRA_REQUEST_SPAN] ||= build_span(env)
end

def fetch_span(env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a functional difference between this and #span!? Is there some place in our code you'd want to fetch a span and get nothing back? And that would be okay?

If not, couldn't you just use #span!?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelchcki thoughts here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added fetch_span to support the existing case in Tracer response filter, where an error is logged if span doesn't exist. (Not sure how valid that approach is, but It wasn't something I wanted to change at this point).

span! will populate the env if the span does not exist hence ! to indicate that. fetch_span is only meant to hide implementation detail of env[SINATRA_REQUEST_SPAN]

span = RequestSpan.span!(env)

# Request headers
parse_request_headers(env).each do |name, value|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly, you mentioned that the reason for doing this header tagging in the middleware, instead of in the existing tracer extension, was because we can only access some data we need from this middleware layer. Is that right?

If that is correct, what data can we exclusively access from this middleware, that we can't access in the tracer?

status, headers, response_body = @app.call(env)

# Response headers
parse_response_headers(headers).each do |name, value|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm if this is the call order?

Middleware.call(env)
  RequestSpan.span!(env)
    Datadog::Tracer.tracer.trace('sinatra.request')
  parse_request_headers(env)
    span.set_tag
  @app.call(env)
    app.before
    app.after
      span.finish
  parse_response_headers(env)
    span.set_tag

If so, I think there's a problem with having a span.set_tag call after we do span.finish. We shouldn't modify spans that have already been finished, because they might be immediately flushed by the writer. All modifications to the span should happen before its closed.

My proposal would be to change this HeaderTaggerMiddleware into TracerMiddleware that is responsible for both starting and finishing the span. Then we should refactor as much of the behavior as we can from the Tracer into this Middleware instead. For whatever can't move into this middleware (e.g. setting the resource as the route, tags, etc), the existing Tracer extension can instead access & modify the span via env, like you've had it doing already.

Beyond solving the existing span start/finish problem, what I like about this is that the Tracer extension effectively becomes an extension to this middleware, and the middleware is no longer co-dependent on the Tracer extension to be in place. It also means Sinatra app users could, in theory, extend or otherwise modify the Tracer module to introduce some custom modifications to the Sinatra tracing.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it looks like TracerMiddleware is better name when it handles whole lifecycle of the span.

As for old Tracer its still there to fill in Sinatra provided data that is not otherwise available from Rack context. Currently both do not depend on each other. I.e. they will work independently but to get good span information both should run.

I don't think there is an alternative approach other than not using Rack Middleware and not being able to tag every header in the response.

span.set_tag(name, value) if span.get_tag(name).nil?
end

span.finish
Copy link
Contributor

@delner delner May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this'll need to be in an ensure block, and only call finish for the span if it isn't finished. Otherwise if the application raises an error, we could have an unclosed span which would cause the trace to disappear. We should also add a test to verify this doesn't happen (if we don't already have one.)

+ ensure span is finished
@@ -71,52 +76,30 @@ def render(engine, data, *)
end
end

app.use HeaderTaggerMiddleware
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You renamed the file but didn't rename the constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that, some brain meltdown happened

module Contrib
module Sinatra
# Middleware used for automatically tagging configured headers and handle request span
class HeaderTaggerMiddleware
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constant needs to be renamed here too?

@pawelchcki
Copy link
Contributor Author

@delner I've addressed the comments - its ready for next review round. Feel free to merge this if you find it ok.

Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for your patience and hard work on this one @pawelchcki !

@delner delner merged commit 7af142e into 0.13-dev Jun 5, 2018
@pawelchcki
Copy link
Contributor Author

💃 Thanks for help and reviews! First PRs are often a bit involved. Happy to see this one merged.

@delner delner deleted the feature/sinatra_add_request_id branch June 20, 2018 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Involves a product feature integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants