Missing resource names for DataDog tracing #1208

delner · 2017-12-26T17:11:26Z

First off, wanted to say thanks for adding DataDog tracing support! Really cool to see support for this in GraphQL.

I was trying to setup and test out graphql-ruby with DataDog tracing enabled, but ran into some issues with the traces, which were being generated incorrectly, causing them not to display in DataDog.

When the GraphQL::Tracing::DataDogTracing receives an execute_multiplex operation to trace, and none of its queries have selected_operation_names, it sets the resource name to an empty string. Although this doesn't raise an error in the Ruby application, when this trace gets sent through the DataDog agent, the agent prunes out 'bad looking' traces, including ones without resource names. Thus, the trace never makes it to DataDog.

The issue should be reproducible with default boilerplate code, and the following:

dd_rails_test_schema.rb

DdRailsTestSchema = GraphQL::Schema.define do
  query(Types::QueryType)
  use(GraphQL::Tracing::DataDogTracing)
end

types/query_type.rb

Types::QueryType = GraphQL::ObjectType.define do
  name "Query"

  field :foo do
    type Types::FooType
    argument :id, !types.ID
    description "Find a Foo by ID"
    resolve ->(obj, args, ctx) { Foo.find(args['id']) }
  end
end

types/foo_type.rb

Types::FooType = GraphQL::ObjectType.define do
  name "Foo"
  field :id, !types.ID
  field :name, types.String
  field :created_at, !types.String
  field :updated_at, !types.String
end

Then assuming a Foo record with ID 1 exists, running a query with:

curl \
  -X POST \
  -H "Content-Type: application/json" \
  --data '{ "query": "{ foo(id: 1) { name } }" }' \
  localhost:3000/graphql

The workaround for this issue, without the changes included in this PR, involve adding the following to the datadog.rb initializer, which leverages the pipeline feature in the DataDog tracing library to modify the incorrectly generated GraphQL traces:

# Temporary fix for GraphQL.
# If spans are generated without resource names,
# then the agent will automatically filter the trace out.
# Remove this when the issue has been fixed in GraphQL.
Datadog::Pipeline.before_flush(
  Datadog::Pipeline::SpanProcessor.new do |span|
    if span.name == 'execute.graphql'
      span.resource = 'execute.graphql' if span.resource.to_s.length == 0
      span.span_type = Datadog::Ext::HTTP::TYPE if span.span_type.to_s.length == 0
    end
  end
)

This pull request sets a span type (for better presentation in DataDog), and sets a custom resource name for execute_multiplex operations only if the sub queries have selected_operation_names. Otherwise it uses the span default, which is the span name (e.g. execute.graphql in this case.) Open to suggestions if we think there might be a better default name to use in this particular case!

This issue is reproducible on master at least still as of 2782cc6343280fb68b1184b05aa18951e1d820e6, with the latest versions of Rails, Ruby, and the ddtrace gem.

Let me know if there's anything else I can do on my end to help with this issue!

rmosolgo · 2018-01-02T21:30:00Z

Thanks for the great writeup on this issue! It's been ... interesting ... to learn the ins and outs of the different platforms :P

👍 from me, @esjee do you have any thoughts on this change?

Willianvdv · 2018-01-03T08:14:59Z

As @esjee is enjoying a well deserved holiday, I'll answer this question. For us, the most important feature is to see the performance of one query over time. Something that was surprisingly hard to achieve and required a lot of trial-and-error-driven development. I think the main cause is that APM isn't created for the use case that one controller receives various types of data requests (graphql queries). Anyway, this is what we got now:

class DatadogTracing < GraphQL::Tracing::PlatformTracing
  # NOTE: do not modify these constants otherwise we'll lose history in Datadog
  SERVICE_NAME = 'core-graphql'
  PARENT_SPAN_NAME = 'execute.graphql'

  self.platform_keys = {
    'lex' => 'lex.graphql',
    'parse' => 'parse.graphql',
    'validate' => 'validate.graphql',
    'analyze_query' => 'analyze.graphql',
    'analyze_multiplex' => 'analyze_multiplex.graphql',
    'execute_multiplex' => PARENT_SPAN_NAME,
    'execute_query' => 'execute_query.graphql',
    'execute_query_lazy' => 'execute_query_lazy.graphql',
    'execute_field' => 'execute.field',
    'execute_field_lazy' => 'execute.field.lazy',
  }

  def platform_trace(platform_key, key, data)
    translated_key_name = @platform_keys.fetch(key)

    Datadog.tracer.trace(translated_key_name, service: SERVICE_NAME) do |span|
      query = extract_query(key, data)
      query_string = query.present? && query.query_string || data.fetch(:query_string)
      span.span_type = 'graphql'
      span.set_tag('platform_key', platform_key)
      span.set_tag('graphql.query.query_string', query_string)

      if query.nil? || query.document.nil? || query.selected_operation.nil?
      elsif updated_untagged_spans?(query)
        update_span(span, query)
      else
        parent_span = parent_span_for_span(span)
        update_untagged_spans(parent_span, query)
        query_context(query)[:updated_untagged_spans] = true
      end

      yield
    end
  end

  def update_span(span, query)
    operation_name = query.selected_operation.name || '<anonymous>'
    operation_type = query.selected_operation.operation_type

    current_user_id = query.context.fetch(:current_user)&.id
    cloudflare_id = query.context.fetch(:request).cookies[CLOUDFLARE_ID]
    resource = operation_name

    span.resource = resource
    span.set_tag('graphql.query.name', resource)
    span.set_tag('graphql.query.operation_type', operation_type)
    span.set_tag('cfduid', cloudflare_id)
    span.set_tag('current_user_id', current_user_id)
  end

  def parent_span_for_span(span)
    return span if span.name == PARENT_SPAN_NAME && span.service == SERVICE_NAME
    return nil if span.parent.nil?
    parent_span(span.parent)
  end

  def query_context(query)
    query.context.namespace(DatadogTracing)
  end

  def update_untagged_spans(span, query)
    return unless span.get_tag('graphql.query.name').nil?
    update_span(span, query)

    span.children.each do |child_span|
      update_untagged_spans(child_span, query)
    end
  end

  def updated_untagged_spans?(query)
    query_context(query)[:updated_untagged_spans]
  end

  def platform_field_key(type, field)
    "#{type.name}.#{field.name}"
  end

  private

  def instrument_scalar_field(type, field)
    instrument_nonscalar_field(type, field)
  end

  def extract_query(key, data)
    # TODO: how to handle multiple queries, with multiplex?
    if key.in?(['execute_multiplex', 'analyze_multiplex'])
      data.fetch(:multiplex).queries.first
    elsif key.in?(['execute_field', 'execute_field_lazy'])
      data.fetch(:context).query
    elsif key.in?(['lex', 'parse'])
      nil
    else
      data.fetch(:query)
    end
  end
end

In Datadog this gives us a nice overview of GraphQL queries:

An important part of this setup is that the root span is being used as the total run time of the GraphQL query, and not some other random span.

^ Note the highlighted span, that's the span that shows up in the overview.

Unfortunately, we don't have the scalar names on the spans. We just couldn't get reliable tracing information and have fancy names, so we opted out for the names for now. I think it's a limitation of DD and it would be nice if a span would accept a display name to render the human-readable name. But that's up to DD to implement.

^ The names are available, you just have to click more :(

@delner I'm not sure if this contains any useful information for you. Feel free to ignore or copy the implementation in your PR.

delner · 2018-01-04T20:29:22Z

Hey @Willianvdv, I took a quick look over at this sample, and it looks really promising! (Love the screenshots!) I think it could improve on a number of things; the resource names are a bit more specific, and the query string is present... both things this tracing module could benefit from. Is it possible to put these changes into another PR? I'd like to give it a proper review in its own right, since there's a lot going on there worth looking at.

I think I'd like to focus this PR just on the bug for now though, if possible, and the minimal set of changes to get things working again. Then we could come back and work these other improvements in?

Willianvdv · 2018-01-04T21:12:45Z

@delner totally! You're right about the overload of data on the spans and that the logging is very specific to our app. I'll create a new PR add you as a reviewer, would be valuable to have multiple parties working on DataDog tracing!

delner · 2018-01-04T21:24:10Z

@Willianvdv Awesome! Looking foward to it! 👍

delner · 2018-01-08T18:16:15Z

Hey @rmosolgo @esjee @Willianvdv , I'm feeling pretty good about how these changes are looking. Have any 👀 to spare for this one?

esjee

Code seems good to me, but I didn't verify the behavior.

esjee · 2018-01-10T10:59:39Z

lib/graphql/tracing/data_dog_tracing.rb

@@ -15,28 +15,32 @@ class DataDogTracing < PlatformTracing
      }

      def platform_trace(platform_key, key, data)
-        service = options.fetch(:service, 'ruby-graphql')
+        tracer.trace(platform_key, service: service_name) do |span|
+          span.span_type = Datadog::Ext::HTTP::TYPE


Is this the best fit? 🤔

Looking at dd-trace-rb, they seem to use it for:

HTTP requests

controllers

grape

Do you think grape is roughly on the same level as GraphQL? If so, then the Ext::HTTP::TYPE could be a good fit.

I think that's a good point. GraphQL isn't explicitly HTTP, I suppose. So maybe it isn't a best fit.

How would you categorize this set of GraphQL operations, if you were to choose your own word for it? My thoughts are that it kind of falls in a "generic Ruby operation" category, but I don't think there's a suitable span type constant defined in dd-trace-rb right now for this.

It should be okay to just give it a value of custom in lieu of a better alternative right now.

Yeah, I agree with you -- can't really find a good fit for it.

esjee · 2018-01-10T11:04:14Z

lib/graphql/tracing/data_dog_tracing.rb

          if key == 'execute_multiplex'
-            span.resource = data[:multiplex].queries.map(&:selected_operation_name).join(', ')
+            operations = data[:multiplex].queries.map(&:selected_operation_name).join(', ')
+            span.resource = operations if operations.length > 0


Are you sure you want to do this .length operation after the join? I realise it doesn't really matter either way, but just be aware that you're counting characters here 😄

Also, consider using the (perhaps more idiomatic?) .empty? method instead of .length > 0.

I like empty? better!

delner · 2018-01-10T16:43:57Z

@esjee Made some changes to address the feedback!

esjee · 2018-01-18T09:24:43Z

Except for Travis, this PR looks good to me. @rmosolgo @Willianvdv any thoughts?

esjee · 2018-01-18T09:25:29Z

Except for Travis, this PR looks good to me. @rmosolgo @Willianvdv any thoughts?

delner · 2018-01-18T15:29:16Z

There was some kind of CI problem with Postgres (new version 1.0.0 that released?) I updated this PR against the latest master, which seemed to fix the build.

rmosolgo

Thanks for the fix here, sorry I've been slow to respond, I just have one more question!

rmosolgo · 2018-01-18T19:30:34Z

lib/graphql/tracing/data_dog_tracing.rb

          if key == 'execute_multiplex'
-            span.resource = data[:multiplex].queries.map(&:selected_operation_name).join(', ')
+            operations = data[:multiplex].queries.map(&:selected_operation_name).join(', ')
+            span.resource = operations if operations.length.empty?


What is operations.length.empty?? Should it be .zero??

# $ irb irb(main):001:0> 0.empty? # NoMethodError: undefined method `empty?' for 0:Fixnum # from (irb):1 # from /usr/bin/irb:12:in `<main>' irb(main):002:0> 0.zero? # => true

Oh man... I must have fat fingered that one. I meant operations.empty? because operations should be a String. I'll update this.

delner · 2018-01-18T20:24:21Z

Good catch, sorry about that @rmosolgo! Should be fixed now.

delner · 2018-01-24T15:51:41Z

Hey @rmosolgo , just wanted to follow up with you on this one. I think it's ready to merge!

rmosolgo · 2018-01-25T13:48:06Z

Great, thanks for your work on it! 👏 !

delner mentioned this pull request Jan 4, 2018

Add GraphQL configuration DataDog/dd-trace-rb#295

Merged

esjee approved these changes Jan 10, 2018

View reviewed changes

rmosolgo reviewed Jan 18, 2018

View reviewed changes

delner and others added 2 commits January 18, 2018 15:21

Fixed: Empty resource for DataDog tracing.

268141c

Changed: DataDogTracing to use configuration instead of pin.

67a3172

rmosolgo merged commit 44e6e15 into rmosolgo:master Jan 25, 2018

rmosolgo added this to the 1.7.9 milestone Jan 25, 2018

rmosolgo added this to the 1.7.9 milestone Feb 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing resource names for DataDog tracing #1208

Missing resource names for DataDog tracing #1208

delner commented Dec 26, 2017

rmosolgo commented Jan 2, 2018

Willianvdv commented Jan 3, 2018 •

edited

Loading

delner commented Jan 4, 2018 •

edited

Loading

Willianvdv commented Jan 4, 2018

delner commented Jan 4, 2018 •

edited

Loading

delner commented Jan 8, 2018

esjee left a comment

esjee Jan 10, 2018

delner Jan 10, 2018

esjee Jan 18, 2018

esjee Jan 10, 2018

delner Jan 10, 2018

delner commented Jan 10, 2018

esjee commented Jan 18, 2018

esjee commented Jan 18, 2018

delner commented Jan 18, 2018 •

edited

Loading

rmosolgo left a comment

rmosolgo Jan 18, 2018

delner Jan 18, 2018

delner commented Jan 18, 2018

delner commented Jan 24, 2018

rmosolgo commented Jan 25, 2018

Missing resource names for DataDog tracing #1208

Missing resource names for DataDog tracing #1208

Conversation

delner commented Dec 26, 2017

rmosolgo commented Jan 2, 2018

Willianvdv commented Jan 3, 2018 • edited Loading

delner commented Jan 4, 2018 • edited Loading

Willianvdv commented Jan 4, 2018

delner commented Jan 4, 2018 • edited Loading

delner commented Jan 8, 2018

esjee left a comment

Choose a reason for hiding this comment

esjee Jan 10, 2018

Choose a reason for hiding this comment

delner Jan 10, 2018

Choose a reason for hiding this comment

esjee Jan 18, 2018

Choose a reason for hiding this comment

esjee Jan 10, 2018

Choose a reason for hiding this comment

delner Jan 10, 2018

Choose a reason for hiding this comment

delner commented Jan 10, 2018

esjee commented Jan 18, 2018

esjee commented Jan 18, 2018

delner commented Jan 18, 2018 • edited Loading

rmosolgo left a comment

Choose a reason for hiding this comment

rmosolgo Jan 18, 2018

Choose a reason for hiding this comment

delner Jan 18, 2018

Choose a reason for hiding this comment

delner commented Jan 18, 2018

delner commented Jan 24, 2018

rmosolgo commented Jan 25, 2018

Willianvdv commented Jan 3, 2018 •

edited

Loading

delner commented Jan 4, 2018 •

edited

Loading

delner commented Jan 4, 2018 •

edited

Loading

delner commented Jan 18, 2018 •

edited

Loading