diff --git a/docs/GettingStarted.md b/docs/GettingStarted.md index 398a51b707..8fd903f676 100644 --- a/docs/GettingStarted.md +++ b/docs/GettingStarted.md @@ -1522,6 +1522,7 @@ run app | `headers` | Hash of HTTP request or response headers to add as tags to the `rack.request`. Accepts `request` and `response` keys with Array values e.g. `['Last-Modified']`. Adds `http.request.headers.*` and `http.response.headers.*` tags respectively. | `{ response: ['Content-Type', 'X-Request-ID'] }` | | `middleware_names` | Enable this if you want to use the last executed middleware class as the resource name for the `rack` span. If enabled alongside the `rails` instrumention, `rails` takes precedence by setting the `rack` resource name to the active `rails` controller when applicable. Requires `application` option to use. | `false` | | `quantize` | Hash containing options for quantization. May include `:query` or `:fragment`. | `{}` | +| `quantize.base` | Defines behavior for URL base (scheme, host, port). Removes URL base from `http.url` tag by default, leaving a path, and sets `http.base_url`. May be `:show` to keep URL base in `http.url` tag and not set `http.base_url` tag. Option must be nested inside the `quantize` option. | `nil` | | `quantize.query` | Hash containing options for query portion of URL quantization. May include `:show` or `:exclude`. See options below. Option must be nested inside the `quantize` option. | `{}` | | `quantize.query.show` | Defines which values should always be shown. Shows no values by default. May be an Array of strings, or `:all` to show all values. Option must be nested inside the `query` option. | `nil` | | `quantize.query.exclude` | Defines which values should be removed entirely. Excludes nothing by default. May be an Array of strings, or `:all` to remove the query string entirely. Option must be nested inside the `query` option. | `nil` | @@ -1529,32 +1530,42 @@ run app | `request_queuing` | Track HTTP request time spent in the queue of the frontend server. See [HTTP request queuing](#http-request-queuing) for setup details. Set to `true` to enable. | `false` | | `web_service_name` | Service name for frontend server request queuing spans. (e.g. `'nginx'`) | `'web-server'` | +Deprecation notice: `quantize.base` will change its default from `:exclude` to `:show` in a future version. Voluntarily moving to `:show` is recommended. + **Configuring URL quantization behavior** ```ruby Datadog.configure do |c| - # Default behavior: all values are quantized, fragment is removed. - # http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id&sort_by - # http://example.com/path?categories[]=1&categories[]=2 --> http://example.com/path?categories[] + # Default behavior: all values are quantized, base is removed, fragment is removed. + # http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id&sort_by + # http://example.com:8080/path?categories[]=1&categories[]=2 --> /path?categories[] + + # Remove URL base (scheme, host, port) + # http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id&sort_by#featured + c.tracing.instrument :rack, quantize: { base: :exclude } + + # Show URL base + # http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id&sort_by#featured + c.tracing.instrument :rack, quantize: { base: :show } # Show values for any query string parameter matching 'category_id' exactly - # http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id=1&sort_by + # http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id=1&sort_by c.tracing.instrument :rack, quantize: { query: { show: ['category_id'] } } # Show all values for all query string parameters - # http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id=1&sort_by=asc + # http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id=1&sort_by=asc c.tracing.instrument :rack, quantize: { query: { show: :all } } # Totally exclude any query string parameter matching 'sort_by' exactly - # http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id + # http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id c.tracing.instrument :rack, quantize: { query: { exclude: ['sort_by'] } } # Remove the query string entirely - # http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path + # http://example.com/path?category_id=1&sort_by=asc#featured --> /path c.tracing.instrument :rack, quantize: { query: { exclude: :all } } # Show URL fragments - # http://example.com/path?category_id=1&sort_by=asc#featured --> http://example.com/path?category_id&sort_by#featured + # http://example.com/path?category_id=1&sort_by=asc#featured --> /path?category_id&sort_by#featured c.tracing.instrument :rack, quantize: { fragment: :show } end ``` diff --git a/lib/datadog/tracing/contrib/rack/middlewares.rb b/lib/datadog/tracing/contrib/rack/middlewares.rb index 9c4e2cf9c1..710f2ccd08 100644 --- a/lib/datadog/tracing/contrib/rack/middlewares.rb +++ b/lib/datadog/tracing/contrib/rack/middlewares.rb @@ -123,18 +123,6 @@ def call(env) # rubocop:disable Metrics/PerceivedComplexity # rubocop:disable Metrics/MethodLength def set_request_tags!(trace, request_span, env, status, headers, response, original_env) - # http://www.rubydoc.info/github/rack/rack/file/SPEC - # The source of truth in Rack is the PATH_INFO key that holds the - # URL for the current request; but some frameworks may override that - # value, especially during exception handling. - # - # Because of this, we prefer to use REQUEST_URI, if available, which is the - # relative path + query string, and doesn't mutate. - # - # REQUEST_URI is only available depending on what web server is running though. - # So when its not available, we want the original, unmutated PATH_INFO, which - # is just the relative path without query strings. - url = env['REQUEST_URI'] || original_env['PATH_INFO'] request_header_collection = Header::RequestHeaderCollection.new(env) request_headers_tags = parse_request_headers(request_header_collection) response_headers_tags = parse_response_headers(headers || {}) @@ -176,14 +164,32 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi request_span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_METHOD, env['REQUEST_METHOD']) end + url = parse_url(env, original_env) + if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_URL).nil? options = configuration[:quantize] + request_span.set_tag( Tracing::Metadata::Ext::HTTP::TAG_URL, Contrib::Utils::Quantization::HTTP.url(url, options) ) end + if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL).nil? + options = configuration[:quantize] + + unless options[:base] == :show + base_url = Contrib::Utils::Quantization::HTTP.base_url(url) + + unless base_url.empty? + request_span.set_tag( + Tracing::Metadata::Ext::HTTP::TAG_BASE_URL, + base_url + ) + end + end + end + if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_CLIENT_IP).nil? Tracing::ClientIp.set_client_ip_tag( request_span, @@ -192,19 +198,6 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi ) end - if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL).nil? - request_obj = ::Rack::Request.new(env) - - base_url = if request_obj.respond_to?(:base_url) - request_obj.base_url - else - # Compatibility for older Rack versions - request_obj.url.chomp(request_obj.fullpath) - end - - request_span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL, base_url) - end - if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_STATUS_CODE).nil? && status request_span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_STATUS_CODE, status) end @@ -238,6 +231,46 @@ def configuration Datadog.configuration.tracing[:rack] end + def parse_url(env, original_env) + request_obj = ::Rack::Request.new(env) + + # scheme, host, and port + base_url = if request_obj.respond_to?(:base_url) + request_obj.base_url + else + # Compatibility for older Rack versions + request_obj.url.chomp(request_obj.fullpath) + end + + # https://github.com/rack/rack/blob/main/SPEC.rdoc + # + # The source of truth in Rack is the PATH_INFO key that holds the + # URL for the current request; but some frameworks may override that + # value, especially during exception handling. + # + # Because of this, we prefer to use REQUEST_URI, if available, which is the + # relative path + query string, and doesn't mutate. + # + # REQUEST_URI is only available depending on what web server is running though. + # So when its not available, we want the original, unmutated PATH_INFO, which + # is just the relative path without query strings. + # + # SCRIPT_NAME is the first part of the request URL path, so that + # the application can know its virtual location. It should be + # prepended to PATH_INFO to reflect the correct user visible path. + request_uri = env['REQUEST_URI'].to_s + fullpath = if request_uri.empty? + query_string = original_env['QUERY_STRING'].to_s + path = original_env['SCRIPT_NAME'].to_s + original_env['PATH_INFO'].to_s + + query_string.empty? ? path : "#{path}?#{query_string}" + else + request_uri + end + + base_url + fullpath + end + def parse_user_agent_header(headers) headers.get(Tracing::Metadata::Ext::HTTP::HEADER_USER_AGENT) end diff --git a/lib/datadog/tracing/contrib/utils/quantization/http.rb b/lib/datadog/tracing/contrib/utils/quantization/http.rb index 30d6d8efc8..7d84d1d0b3 100644 --- a/lib/datadog/tracing/contrib/utils/quantization/http.rb +++ b/lib/datadog/tracing/contrib/utils/quantization/http.rb @@ -22,6 +22,14 @@ def url(url, options = {}) options[:placeholder] || PLACEHOLDER end + def base_url(url, options = {}) + URI.parse(url).tap do |uri| + uri.path = '' + uri.query = nil + uri.fragment = nil + end.to_s + end + def url!(url, options = {}) options ||= {} @@ -32,8 +40,14 @@ def url!(url, options = {}) uri.query = (!query.nil? && query.empty? ? nil : query) end - # Remove any URI framents + # Remove any URI fragments uri.fragment = nil unless options[:fragment] == :show + + unless options[:base] == :show + uri.host = nil + uri.port = nil + uri.scheme = nil + end end.to_s end diff --git a/spec/datadog/tracing/contrib/rack/integration_test_spec.rb b/spec/datadog/tracing/contrib/rack/integration_test_spec.rb index 490d18a497..feb18ffaf9 100644 --- a/spec/datadog/tracing/contrib/rack/integration_test_spec.rb +++ b/spec/datadog/tracing/contrib/rack/integration_test_spec.rb @@ -118,7 +118,7 @@ it do # Since REQUEST_URI isn't available in Rack::Test by default (comes from WEBrick/Puma) # it reverts to PATH_INFO, which doesn't have query string parameters. - expect(span.get_tag('http.url')).to eq('/success') + expect(span.get_tag('http.url')).to eq('/success?foo') expect(span.get_tag('http.base_url')).to eq('http://example.org') expect(span).to be_root_span end