Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use full URL in http.url tag #2265

Merged
merged 9 commits into from
Sep 23, 2022
Merged
27 changes: 19 additions & 8 deletions docs/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -1522,39 +1522,50 @@ run app
| `headers` | Hash of HTTP request or response headers to add as tags to the `rack.request`. Accepts `request` and `response` keys with Array values e.g. `['Last-Modified']`. Adds `http.request.headers.*` and `http.response.headers.*` tags respectively. | `{ response: ['Content-Type', 'X-Request-ID'] }` |
| `middleware_names` | Enable this if you want to use the last executed middleware class as the resource name for the `rack` span. If enabled alongside the `rails` instrumention, `rails` takes precedence by setting the `rack` resource name to the active `rails` controller when applicable. Requires `application` option to use. | `false` |
| `quantize` | Hash containing options for quantization. May include `:query` or `:fragment`. | `{}` |
| `quantize.base` | Defines behavior for URL base (scheme, host, port). Removes URL base from `http.url` tag by default, leaving a path, and sets `http.base_url`. May be `:show` to keep URL base in `http.url` tag and not set `http.base_url` tag. Option must be nested inside the `quantize` option. | `nil` |
| `quantize.query` | Hash containing options for query portion of URL quantization. May include `:show` or `:exclude`. See options below. Option must be nested inside the `quantize` option. | `{}` |
| `quantize.query.show` | Defines which values should always be shown. Shows no values by default. May be an Array of strings, or `:all` to show all values. Option must be nested inside the `query` option. | `nil` |
| `quantize.query.exclude` | Defines which values should be removed entirely. Excludes nothing by default. May be an Array of strings, or `:all` to remove the query string entirely. Option must be nested inside the `query` option. | `nil` |
| `quantize.fragment` | Defines behavior for URL fragments. Removes fragments by default. May be `:show` to show URL fragments. Option must be nested inside the `quantize` option. | `nil` |
| `request_queuing` | Track HTTP request time spent in the queue of the frontend server. See [HTTP request queuing](#http-request-queuing) for setup details. Set to `true` to enable. | `false` |
| `web_service_name` | Service name for frontend server request queuing spans. (e.g. `'nginx'`) | `'web-server'` |

Deprecation notice: `quantize.base` will change its default from `:exclude` to `:show` in a future version. Voluntarily moving to `:show` is recommended.

**Configuring URL quantization behavior**

```ruby
Datadog.configure do |c|
# Default behavior: all values are quantized, fragment is removed.
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id&sort_by
# http://example.com/path?categories[]=1&categories[]=2 --> http://example.com/path?categories[]
Copy link
Contributor Author

@lloeki lloeki Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically, the doc here was consistent with the internal Quantization::HTTP.url but not with the user-facing Rack integration http.url tagging behaviour!

# Default behavior: all values are quantized, base is removed, fragment is removed.
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id&sort_by
# http://example.com:8080/path?categories[]=1&categories[]=2 --> /path?categories[]

# Remove URL base (scheme, host, port)
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id&sort_by#featured
c.tracing.instrument :rack, quantize: { base: :exclude }

# Show URL base
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id&sort_by#featured
c.tracing.instrument :rack, quantize: { base: :show }

# Show values for any query string parameter matching 'category_id' exactly
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id=1&sort_by
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id=1&sort_by
c.tracing.instrument :rack, quantize: { query: { show: ['category_id'] } }
marcotc marked this conversation as resolved.
Show resolved Hide resolved

# Show all values for all query string parameters
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id=1&sort_by=asc
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id=1&sort_by=asc
c.tracing.instrument :rack, quantize: { query: { show: :all } }

# Totally exclude any query string parameter matching 'sort_by' exactly
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id
c.tracing.instrument :rack, quantize: { query: { exclude: ['sort_by'] } }

# Remove the query string entirely
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path
c.tracing.instrument :rack, quantize: { query: { exclude: :all } }

# Show URL fragments
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id&sort_by#featured
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id&sort_by#featured
c.tracing.instrument :rack, quantize: { fragment: :show }
end
```
Expand Down
91 changes: 65 additions & 26 deletions lib/datadog/tracing/contrib/rack/middlewares.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ module Rack
# in the Rack environment so that it can be retrieved by the underlying
# application. If request tags are not set by the app, they will be set using
# information available at the Rack level.
#
# rubocop:disable Metrics/ClassLength
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can get the argument for method length (with exceptions), but I'm wondering if there's really such a thing as a class with "too much lines".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've ever decided to break down a class when Metrics/ClassLength is triggered, only ever added the cop exclusion tag.

If you want to remove the cop, I'd ✅ it.

Copy link
Contributor Author

@lloeki lloeki Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there are not too many of them, so it can reasonably fit into this PR. Will do.

class TraceMiddleware
def initialize(app)
@app = app
Expand Down Expand Up @@ -123,18 +125,6 @@ def call(env)
# rubocop:disable Metrics/PerceivedComplexity
# rubocop:disable Metrics/MethodLength
def set_request_tags!(trace, request_span, env, status, headers, response, original_env)
# http://www.rubydoc.info/github/rack/rack/file/SPEC
# The source of truth in Rack is the PATH_INFO key that holds the
# URL for the current request; but some frameworks may override that
# value, especially during exception handling.
#
# Because of this, we prefer to use REQUEST_URI, if available, which is the
# relative path + query string, and doesn't mutate.
#
# REQUEST_URI is only available depending on what web server is running though.
# So when its not available, we want the original, unmutated PATH_INFO, which
# is just the relative path without query strings.
url = env['REQUEST_URI'] || original_env['PATH_INFO']
request_header_collection = Header::RequestHeaderCollection.new(env)
request_headers_tags = parse_request_headers(request_header_collection)
response_headers_tags = parse_response_headers(headers || {})
Expand Down Expand Up @@ -176,14 +166,35 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
request_span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_METHOD, env['REQUEST_METHOD'])
end

url = parse_url(env, original_env)

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_URL).nil?
options = configuration[:quantize]
options = configuration[:quantize] || {}

# Quantization::HTTP.url base defaults to :show, but we are transitioning
options[:base] ||= :exclude
Comment on lines +170 to +173
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be done here, as doing it as a config default would be silently overridden when :quantize is manually set, which would be most surprising to the user.

Probably a general problem with such nested configuration keys.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, this will be removed in the next major release, is that right?

If so, I suggest writing them down in the comment, so we can find it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can you create a new file named docs/NextReleaseUpgradeGuide.md (name suggestion only), just add one section:

# Breaking changes

To aid with the next release process, please tag future breaking changes the inline code comment `# DEV-2.0:` prefix. 

## List of breaking changes

* `http.url` will change its default...

We've been meaning to do it, and it seems like your PR wants to have a breaking change in the next release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, I suggest adding the comment in this line with the prefix # DEV-2.0:.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points! Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll do that in another PR, along with this suggestion: #2283 (comment)


request_span.set_tag(
Tracing::Metadata::Ext::HTTP::TAG_URL,
Contrib::Utils::Quantization::HTTP.url(url, options)
)
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL).nil?
options = configuration[:quantize]

unless options[:base] == :show
base_url = Contrib::Utils::Quantization::HTTP.base_url(url)

unless base_url.empty?
request_span.set_tag(
Tracing::Metadata::Ext::HTTP::TAG_BASE_URL,
base_url
)
end
end
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_CLIENT_IP).nil?
Tracing::ClientIp.set_client_ip_tag(
request_span,
Expand All @@ -192,19 +203,6 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
)
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL).nil?
request_obj = ::Rack::Request.new(env)

base_url = if request_obj.respond_to?(:base_url)
request_obj.base_url
else
# Compatibility for older Rack versions
request_obj.url.chomp(request_obj.fullpath)
end

request_span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL, base_url)
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_STATUS_CODE).nil? && status
request_span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_STATUS_CODE, status)
end
Expand Down Expand Up @@ -238,6 +236,46 @@ def configuration
Datadog.configuration.tracing[:rack]
end

def parse_url(env, original_env)
request_obj = ::Rack::Request.new(env)

# scheme, host, and port
base_url = if request_obj.respond_to?(:base_url)
request_obj.base_url
else
# Compatibility for older Rack versions
request_obj.url.chomp(request_obj.fullpath)
end

# https://github.com/rack/rack/blob/main/SPEC.rdoc
#
# The source of truth in Rack is the PATH_INFO key that holds the
# URL for the current request; but some frameworks may override that
# value, especially during exception handling.
#
# Because of this, we prefer to use REQUEST_URI, if available, which is the
# relative path + query string, and doesn't mutate.
#
# REQUEST_URI is only available depending on what web server is running though.
# So when its not available, we want the original, unmutated PATH_INFO, which
# is just the relative path without query strings.
#
# SCRIPT_NAME is the first part of the request URL path, so that
# the application can know its virtual location. It should be
# prepended to PATH_INFO to reflect the correct user visible path.
request_uri = env['REQUEST_URI'].to_s
fullpath = if request_uri.empty?
query_string = original_env['QUERY_STRING'].to_s
path = original_env['SCRIPT_NAME'].to_s + original_env['PATH_INFO'].to_s

query_string.empty? ? path : "#{path}?#{query_string}"
else
request_uri
end

base_url + fullpath
end

def parse_user_agent_header(headers)
headers.get(Tracing::Metadata::Ext::HTTP::HEADER_USER_AGENT)
end
Expand Down Expand Up @@ -271,6 +309,7 @@ def parse_response_headers(headers)
end
end
end
# rubocop:enable Metrics/ClassLength
end
end
end
Expand Down
16 changes: 15 additions & 1 deletion lib/datadog/tracing/contrib/utils/quantization/http.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,14 @@ def url(url, options = {})
options[:placeholder] || PLACEHOLDER
lloeki marked this conversation as resolved.
Show resolved Hide resolved
end

def base_url(url, options = {})
URI.parse(url).tap do |uri|
uri.path = ''
uri.query = nil
uri.fragment = nil
end.to_s
end

def url!(url, options = {})
options ||= {}

Expand All @@ -32,8 +40,14 @@ def url!(url, options = {})
uri.query = (!query.nil? && query.empty? ? nil : query)
end

# Remove any URI framents
# Remove any URI fragments
uri.fragment = nil unless options[:fragment] == :show

if options[:base] == :exclude
uri.host = nil
uri.port = nil
uri.scheme = nil
end
end.to_s
end

Expand Down
56 changes: 43 additions & 13 deletions spec/datadog/tracing/contrib/rack/integration_test_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
end
end

after { Datadog.registry[:rack].reset_configuration! }

context 'for an application' do
let(:app) do
app_routes = routes
Expand Down Expand Up @@ -101,10 +103,24 @@

it_behaves_like 'a rack GET 200 span'

it do
expect(span.get_tag('http.url')).to eq('/success/')
expect(span.get_tag('http.base_url')).to eq('http://example.org')
expect(span).to be_root_span
context 'and default quantization' do
let(:rack_options) { { quantize: {} } }

it do
expect(span.get_tag('http.url')).to eq('/success/')
expect(span.get_tag('http.base_url')).to eq('http://example.org')
expect(span).to be_root_span
end
end

context 'and quantization activated for URL base' do
let(:rack_options) { { quantize: { base: :show } } }

it do
expect(span.get_tag('http.url')).to eq('http://example.org/success/')
expect(span.get_tag('http.base_url')).to be_nil
expect(span).to be_root_span
end
end

it { expect(trace.resource).to eq('GET 200') }
Expand All @@ -113,22 +129,36 @@
context 'with query string parameters' do
let(:route) { '/success?foo=bar' }

it_behaves_like 'a rack GET 200 span'
context 'and default quantization' do
let(:rack_options) { { quantize: {} } }

it do
# Since REQUEST_URI isn't available in Rack::Test by default (comes from WEBrick/Puma)
# it reverts to PATH_INFO, which doesn't have query string parameters.
expect(span.get_tag('http.url')).to eq('/success')
expect(span.get_tag('http.base_url')).to eq('http://example.org')
expect(span).to be_root_span
it_behaves_like 'a rack GET 200 span'

it do
expect(span.get_tag('http.url')).to eq('/success?foo')
expect(span.get_tag('http.base_url')).to eq('http://example.org')
expect(span).to be_root_span
end
end

context 'and quantization activated for the query' do
let(:rack_options) { { quantize: { query: { show: ['foo'] } } } }

it_behaves_like 'a rack GET 200 span'

it do
expect(span.get_tag('http.url')).to eq('/success?foo=bar')
expect(span.get_tag('http.base_url')).to eq('http://example.org')
expect(span).to be_root_span
end
end
end

context 'with REQUEST_URI' do
subject(:response) { get '/success?foo=bar', {}, 'REQUEST_URI' => '/success?foo=bar' }

context 'and default quantization' do
let(:rack_options) { super().merge(quantize: {}) }
let(:rack_options) { { quantize: {} } }

it_behaves_like 'a rack GET 200 span'

Expand All @@ -143,7 +173,7 @@
end

context 'and quantization activated for the query' do
let(:rack_options) { super().merge(quantize: { query: { show: ['foo'] } }) }
let(:rack_options) { { quantize: { query: { show: ['foo'] } } } }

it_behaves_like 'a rack GET 200 span'

Expand Down
8 changes: 4 additions & 4 deletions spec/datadog/tracing/contrib/sinatra/multi_app_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@
spans.each do |span|
if span.name == Datadog::Tracing::Contrib::Rack::Ext::SPAN_REQUEST
expect(span.resource).to eq('GET /endpoint')
expect(span.get_tag(Datadog::Tracing::Metadata::Ext::HTTP::TAG_URL)).to eq('/endpoint')
expect(span.get_tag(Datadog::Tracing::Metadata::Ext::HTTP::TAG_URL)).to eq('/one/endpoint')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why these Sinatra tests have changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tags actually come from Rack, not Sinatra. I believe they're here because at some point Sinatra may have been backfilling this value by itself. In any case they act as an integration test checking for side effects.

Sadly this test would previously pass but not represent what happens when real-world REQUEST_URI is set, indeed Rack::Test does not set REQUEST_URI.

The change here is mainly due to a bug that this PR addresses, that previously made the fallback to PATH_INFO be inconsistent with REQUEST_URI (i.e it did not include SCRIPT_NAME nor QUERY_STRING), which created all kinds of special cases from the lack of query string to adding base_url tag + url tag not resulting in the same value depending on whether REQUEST_URI or PATH_INFO was used, not the least in the latter case base_url tag + url tag is entirely nonsensical for virtualhosted or nested apps since it misses the intermediate SCRIPT_NAME.


next
end
Expand All @@ -95,7 +95,7 @@
spans.each do |span|
if span.name == Datadog::Tracing::Contrib::Rack::Ext::SPAN_REQUEST
expect(span.resource).to eq('GET /endpoint')
expect(span.get_tag(Datadog::Tracing::Metadata::Ext::HTTP::TAG_URL)).to eq('/endpoint')
expect(span.get_tag(Datadog::Tracing::Metadata::Ext::HTTP::TAG_URL)).to eq('/two/endpoint')

next
end
Expand All @@ -120,7 +120,7 @@
spans.each do |span|
if span.name == Datadog::Tracing::Contrib::Rack::Ext::SPAN_REQUEST
expect(span.resource).to eq('GET /one/endpoint')
expect(span.get_tag(Datadog::Tracing::Metadata::Ext::HTTP::TAG_URL)).to eq('/endpoint')
expect(span.get_tag(Datadog::Tracing::Metadata::Ext::HTTP::TAG_URL)).to eq('/one/endpoint')

next
end
Expand All @@ -141,7 +141,7 @@
spans.each do |span|
if span.name == Datadog::Tracing::Contrib::Rack::Ext::SPAN_REQUEST
expect(span.resource).to eq('GET /two/endpoint')
expect(span.get_tag(Datadog::Tracing::Metadata::Ext::HTTP::TAG_URL)).to eq('/endpoint')
expect(span.get_tag(Datadog::Tracing::Metadata::Ext::HTTP::TAG_URL)).to eq('/two/endpoint')

next
end
Expand Down
14 changes: 13 additions & 1 deletion spec/datadog/tracing/contrib/utils/quantization/http_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,24 @@
it { is_expected.to eq('http://example.com/path') }
end

context 'with show: :all' do
context 'with fragment: :show' do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed to be a typo/eager copy-pasting.

let(:options) { { fragment: :show } }

it { is_expected.to eq('http://example.com/path?category_id&sort_by#featured') }
end

context 'with base: :show' do
let(:options) { { base: :show } }

it { is_expected.to eq('http://example.com/path?category_id&sort_by') }
end

context 'with base: :exclude' do
let(:options) { { base: :exclude } }

it { is_expected.to eq('/path?category_id&sort_by') }
end

context 'with Unicode characters' do
# URLs do not permit unencoded non-ASCII characters in the URL.
let(:url) { 'http://example.com/path?繋がってて' }
Expand Down