Skip to content

Commit

Permalink
Adjust HTTP URL tags to comply with span tag unification RFC
Browse files Browse the repository at this point in the history
- add a { base: :show } quantization option to enable the behaviour
- align http.url with tag naming RFC (and Rack's #url) to include the
  complete request URL
- drop http.base_url when not unneeded
- preserve REQUEST_URI logic that differs from Rack's logic
- add SCRIPT_NAME which was previously unhandled for PATH_INFO
- append QUERY_STRING when using PATH_INFO to match REQUEST_URI
  • Loading branch information
lloeki committed Sep 13, 2022
1 parent 64b0526 commit 3b2c316
Show file tree
Hide file tree
Showing 4 changed files with 93 additions and 35 deletions.
27 changes: 19 additions & 8 deletions docs/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -1522,39 +1522,50 @@ run app
| `headers` | Hash of HTTP request or response headers to add as tags to the `rack.request`. Accepts `request` and `response` keys with Array values e.g. `['Last-Modified']`. Adds `http.request.headers.*` and `http.response.headers.*` tags respectively. | `{ response: ['Content-Type', 'X-Request-ID'] }` |
| `middleware_names` | Enable this if you want to use the last executed middleware class as the resource name for the `rack` span. If enabled alongside the `rails` instrumention, `rails` takes precedence by setting the `rack` resource name to the active `rails` controller when applicable. Requires `application` option to use. | `false` |
| `quantize` | Hash containing options for quantization. May include `:query` or `:fragment`. | `{}` |
| `quantize.base` | Defines behavior for URL base (scheme, host, port). Removes URL base from `http.url` tag by default, leaving a path, and sets `http.base_url`. May be `:show` to keep URL base in `http.url` tag and not set `http.base_url` tag. Option must be nested inside the `quantize` option. | `nil` |
| `quantize.query` | Hash containing options for query portion of URL quantization. May include `:show` or `:exclude`. See options below. Option must be nested inside the `quantize` option. | `{}` |
| `quantize.query.show` | Defines which values should always be shown. Shows no values by default. May be an Array of strings, or `:all` to show all values. Option must be nested inside the `query` option. | `nil` |
| `quantize.query.exclude` | Defines which values should be removed entirely. Excludes nothing by default. May be an Array of strings, or `:all` to remove the query string entirely. Option must be nested inside the `query` option. | `nil` |
| `quantize.fragment` | Defines behavior for URL fragments. Removes fragments by default. May be `:show` to show URL fragments. Option must be nested inside the `quantize` option. | `nil` |
| `request_queuing` | Track HTTP request time spent in the queue of the frontend server. See [HTTP request queuing](#http-request-queuing) for setup details. Set to `true` to enable. | `false` |
| `web_service_name` | Service name for frontend server request queuing spans. (e.g. `'nginx'`) | `'web-server'` |

Deprecation notice: `quantize.base` will change its default from `:exclude` to `:show` in a future version. Voluntarily moving to `:show` is recommended.

**Configuring URL quantization behavior**

```ruby
Datadog.configure do |c|
# Default behavior: all values are quantized, fragment is removed.
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id&sort_by
# http://example.com/path?categories[]=1&categories[]=2 --> http://example.com/path?categories[]
# Default behavior: all values are quantized, base is removed, fragment is removed.
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id&sort_by
# http://example.com:8080/path?categories[]=1&categories[]=2 --> /path?categories[]
# Remove URL base (scheme, host, port)
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id&sort_by#featured
c.tracing.instrument :rack, quantize: { base: :exclude }
# Show URL base
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id&sort_by#featured
c.tracing.instrument :rack, quantize: { base: :show }
# Show values for any query string parameter matching 'category_id' exactly
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id=1&sort_by
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id=1&sort_by
c.tracing.instrument :rack, quantize: { query: { show: ['category_id'] } }
# Show all values for all query string parameters
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id=1&sort_by=asc
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id=1&sort_by=asc
c.tracing.instrument :rack, quantize: { query: { show: :all } }
# Totally exclude any query string parameter matching 'sort_by' exactly
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id
c.tracing.instrument :rack, quantize: { query: { exclude: ['sort_by'] } }
# Remove the query string entirely
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path
c.tracing.instrument :rack, quantize: { query: { exclude: :all } }
# Show URL fragments
# http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id&sort_by#featured
# http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id&sort_by#featured
c.tracing.instrument :rack, quantize: { fragment: :show }
end
```
Expand Down
83 changes: 58 additions & 25 deletions lib/datadog/tracing/contrib/rack/middlewares.rb
Original file line number Diff line number Diff line change
Expand Up @@ -123,18 +123,6 @@ def call(env)
# rubocop:disable Metrics/PerceivedComplexity
# rubocop:disable Metrics/MethodLength
def set_request_tags!(trace, request_span, env, status, headers, response, original_env)
# http://www.rubydoc.info/github/rack/rack/file/SPEC
# The source of truth in Rack is the PATH_INFO key that holds the
# URL for the current request; but some frameworks may override that
# value, especially during exception handling.
#
# Because of this, we prefer to use REQUEST_URI, if available, which is the
# relative path + query string, and doesn't mutate.
#
# REQUEST_URI is only available depending on what web server is running though.
# So when its not available, we want the original, unmutated PATH_INFO, which
# is just the relative path without query strings.
url = env['REQUEST_URI'] || original_env['PATH_INFO']
request_header_collection = Header::RequestHeaderCollection.new(env)
request_headers_tags = parse_request_headers(request_header_collection)
response_headers_tags = parse_response_headers(headers || {})
Expand Down Expand Up @@ -176,14 +164,32 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
request_span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_METHOD, env['REQUEST_METHOD'])
end

url = parse_url(env, original_env)

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_URL).nil?
options = configuration[:quantize]

request_span.set_tag(
Tracing::Metadata::Ext::HTTP::TAG_URL,
Contrib::Utils::Quantization::HTTP.url(url, options)
)
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL).nil?
options = configuration[:quantize]

unless options[:base] == :show
base_url = Contrib::Utils::Quantization::HTTP.base_url(url)

unless base_url.empty?
request_span.set_tag(
Tracing::Metadata::Ext::HTTP::TAG_BASE_URL,
base_url
)
end
end
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_CLIENT_IP).nil?
Tracing::ClientIp.set_client_ip_tag(
request_span,
Expand All @@ -192,19 +198,6 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
)
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL).nil?
request_obj = ::Rack::Request.new(env)

base_url = if request_obj.respond_to?(:base_url)
request_obj.base_url
else
# Compatibility for older Rack versions
request_obj.url.chomp(request_obj.fullpath)
end

request_span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL, base_url)
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_STATUS_CODE).nil? && status
request_span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_STATUS_CODE, status)
end
Expand Down Expand Up @@ -238,6 +231,46 @@ def configuration
Datadog.configuration.tracing[:rack]
end

def parse_url(env, original_env)
request_obj = ::Rack::Request.new(env)

# scheme, host, and port
base_url = if request_obj.respond_to?(:base_url)
request_obj.base_url
else
# Compatibility for older Rack versions
request_obj.url.chomp(request_obj.fullpath)
end

# https://github.com/rack/rack/blob/main/SPEC.rdoc
#
# The source of truth in Rack is the PATH_INFO key that holds the
# URL for the current request; but some frameworks may override that
# value, especially during exception handling.
#
# Because of this, we prefer to use REQUEST_URI, if available, which is the
# relative path + query string, and doesn't mutate.
#
# REQUEST_URI is only available depending on what web server is running though.
# So when its not available, we want the original, unmutated PATH_INFO, which
# is just the relative path without query strings.
#
# SCRIPT_NAME is the first part of the request URL path, so that
# the application can know its virtual location. It should be
# prepended to PATH_INFO to reflect the correct user visible path.
request_uri = env['REQUEST_URI'].to_s
fullpath = if request_uri.empty?
query_string = original_env['QUERY_STRING'].to_s
path = original_env['SCRIPT_NAME'].to_s + original_env['PATH_INFO'].to_s

query_string.empty? ? path : "#{path}?#{query_string}"
else
request_uri
end

base_url + fullpath
end

def parse_user_agent_header(headers)
headers.get(Tracing::Metadata::Ext::HTTP::HEADER_USER_AGENT)
end
Expand Down
16 changes: 15 additions & 1 deletion lib/datadog/tracing/contrib/utils/quantization/http.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,14 @@ def url(url, options = {})
options[:placeholder] || PLACEHOLDER
end

def base_url(url, options = {})
URI.parse(url).tap do |uri|
uri.path = ''
uri.query = nil
uri.fragment = nil
end.to_s
end

def url!(url, options = {})
options ||= {}

Expand All @@ -32,8 +40,14 @@ def url!(url, options = {})
uri.query = (!query.nil? && query.empty? ? nil : query)
end

# Remove any URI framents
# Remove any URI fragments
uri.fragment = nil unless options[:fragment] == :show

unless options[:base] == :show
uri.host = nil
uri.port = nil
uri.scheme = nil
end
end.to_s
end

Expand Down
2 changes: 1 addition & 1 deletion spec/datadog/tracing/contrib/rack/integration_test_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@
it do
# Since REQUEST_URI isn't available in Rack::Test by default (comes from WEBrick/Puma)
# it reverts to PATH_INFO, which doesn't have query string parameters.
expect(span.get_tag('http.url')).to eq('/success')
expect(span.get_tag('http.url')).to eq('/success?foo')
expect(span.get_tag('http.base_url')).to eq('http://example.org')
expect(span).to be_root_span
end
Expand Down

0 comments on commit 3b2c316

Please sign in to comment.