Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing resource names for DataDog tracing #1208

Merged
merged 2 commits into from
Jan 25, 2018
Merged

Missing resource names for DataDog tracing #1208

merged 2 commits into from
Jan 25, 2018

Conversation

delner
Copy link
Contributor

@delner delner commented Dec 26, 2017

First off, wanted to say thanks for adding DataDog tracing support! Really cool to see support for this in GraphQL.

I was trying to setup and test out graphql-ruby with DataDog tracing enabled, but ran into some issues with the traces, which were being generated incorrectly, causing them not to display in DataDog.

When the GraphQL::Tracing::DataDogTracing receives an execute_multiplex operation to trace, and none of its queries have selected_operation_names, it sets the resource name to an empty string. Although this doesn't raise an error in the Ruby application, when this trace gets sent through the DataDog agent, the agent prunes out 'bad looking' traces, including ones without resource names. Thus, the trace never makes it to DataDog.

The issue should be reproducible with default boilerplate code, and the following:

dd_rails_test_schema.rb

DdRailsTestSchema = GraphQL::Schema.define do
  query(Types::QueryType)
  use(GraphQL::Tracing::DataDogTracing)
end

types/query_type.rb

Types::QueryType = GraphQL::ObjectType.define do
  name "Query"

  field :foo do
    type Types::FooType
    argument :id, !types.ID
    description "Find a Foo by ID"
    resolve ->(obj, args, ctx) { Foo.find(args['id']) }
  end
end

types/foo_type.rb

Types::FooType = GraphQL::ObjectType.define do
  name "Foo"
  field :id, !types.ID
  field :name, types.String
  field :created_at, !types.String
  field :updated_at, !types.String
end

Then assuming a Foo record with ID 1 exists, running a query with:

curl \
  -X POST \
  -H "Content-Type: application/json" \
  --data '{ "query": "{ foo(id: 1) { name } }" }' \
  localhost:3000/graphql

The workaround for this issue, without the changes included in this PR, involve adding the following to the datadog.rb initializer, which leverages the pipeline feature in the DataDog tracing library to modify the incorrectly generated GraphQL traces:

# Temporary fix for GraphQL.
# If spans are generated without resource names,
# then the agent will automatically filter the trace out.
# Remove this when the issue has been fixed in GraphQL.
Datadog::Pipeline.before_flush(
  Datadog::Pipeline::SpanProcessor.new do |span|
    if span.name == 'execute.graphql'
      span.resource = 'execute.graphql' if span.resource.to_s.length == 0
      span.span_type = Datadog::Ext::HTTP::TYPE if span.span_type.to_s.length == 0
    end
  end
)

This pull request sets a span type (for better presentation in DataDog), and sets a custom resource name for execute_multiplex operations only if the sub queries have selected_operation_names. Otherwise it uses the span default, which is the span name (e.g. execute.graphql in this case.) Open to suggestions if we think there might be a better default name to use in this particular case!

This issue is reproducible on master at least still as of 2782cc6343280fb68b1184b05aa18951e1d820e6, with the latest versions of Rails, Ruby, and the ddtrace gem.

Let me know if there's anything else I can do on my end to help with this issue!

@rmosolgo
Copy link
Owner

rmosolgo commented Jan 2, 2018

Thanks for the great writeup on this issue! It's been ... interesting ... to learn the ins and outs of the different platforms :P

👍 from me, @esjee do you have any thoughts on this change?

@Willianvdv
Copy link
Contributor

Willianvdv commented Jan 3, 2018

As @esjee is enjoying a well deserved holiday, I'll answer this question. For us, the most important feature is to see the performance of one query over time. Something that was surprisingly hard to achieve and required a lot of trial-and-error-driven development. I think the main cause is that APM isn't created for the use case that one controller receives various types of data requests (graphql queries). Anyway, this is what we got now:

class DatadogTracing < GraphQL::Tracing::PlatformTracing
  # NOTE: do not modify these constants otherwise we'll lose history in Datadog
  SERVICE_NAME = 'core-graphql'
  PARENT_SPAN_NAME = 'execute.graphql'

  self.platform_keys = {
    'lex' => 'lex.graphql',
    'parse' => 'parse.graphql',
    'validate' => 'validate.graphql',
    'analyze_query' => 'analyze.graphql',
    'analyze_multiplex' => 'analyze_multiplex.graphql',
    'execute_multiplex' => PARENT_SPAN_NAME,
    'execute_query' => 'execute_query.graphql',
    'execute_query_lazy' => 'execute_query_lazy.graphql',
    'execute_field' => 'execute.field',
    'execute_field_lazy' => 'execute.field.lazy',
  }

  def platform_trace(platform_key, key, data)
    translated_key_name = @platform_keys.fetch(key)

    Datadog.tracer.trace(translated_key_name, service: SERVICE_NAME) do |span|
      query = extract_query(key, data)
      query_string = query.present? && query.query_string || data.fetch(:query_string)
      span.span_type = 'graphql'
      span.set_tag('platform_key', platform_key)
      span.set_tag('graphql.query.query_string', query_string)

      if query.nil? || query.document.nil? || query.selected_operation.nil?
      elsif updated_untagged_spans?(query)
        update_span(span, query)
      else
        parent_span = parent_span_for_span(span)
        update_untagged_spans(parent_span, query)
        query_context(query)[:updated_untagged_spans] = true
      end

      yield
    end
  end

  def update_span(span, query)
    operation_name = query.selected_operation.name || '<anonymous>'
    operation_type = query.selected_operation.operation_type

    current_user_id = query.context.fetch(:current_user)&.id
    cloudflare_id = query.context.fetch(:request).cookies[CLOUDFLARE_ID]
    resource = operation_name

    span.resource = resource
    span.set_tag('graphql.query.name', resource)
    span.set_tag('graphql.query.operation_type', operation_type)
    span.set_tag('cfduid', cloudflare_id)
    span.set_tag('current_user_id', current_user_id)
  end

  def parent_span_for_span(span)
    return span if span.name == PARENT_SPAN_NAME && span.service == SERVICE_NAME
    return nil if span.parent.nil?
    parent_span(span.parent)
  end

  def query_context(query)
    query.context.namespace(DatadogTracing)
  end

  def update_untagged_spans(span, query)
    return unless span.get_tag('graphql.query.name').nil?
    update_span(span, query)

    span.children.each do |child_span|
      update_untagged_spans(child_span, query)
    end
  end

  def updated_untagged_spans?(query)
    query_context(query)[:updated_untagged_spans]
  end

  def platform_field_key(type, field)
    "#{type.name}.#{field.name}"
  end

  private

  def instrument_scalar_field(type, field)
    instrument_nonscalar_field(type, field)
  end

  def extract_query(key, data)
    # TODO: how to handle multiple queries, with multiplex?
    if key.in?(['execute_multiplex', 'analyze_multiplex'])
      data.fetch(:multiplex).queries.first
    elsif key.in?(['execute_field', 'execute_field_lazy'])
      data.fetch(:context).query
    elsif key.in?(['lex', 'parse'])
      nil
    else
      data.fetch(:query)
    end
  end
end

In Datadog this gives us a nice overview of GraphQL queries:
core-grap __env_production____datadog

An important part of this setup is that the root span is being used as the total run time of the GraphQL query, and not some other random span.

user_mini __env_production____datadog
^ Note the highlighted span, that's the span that shows up in the overview.

Unfortunately, we don't have the scalar names on the spans. We just couldn't get reliable tracing information and have fancy names, so we opted out for the names for now. I think it's a limitation of DD and it would be nice if a span would accept a display name to render the human-readable name. But that's up to DD to implement.

user_mini __env_production____datadog
^ The names are available, you just have to click more :(

@delner I'm not sure if this contains any useful information for you. Feel free to ignore or copy the implementation in your PR.

@delner
Copy link
Contributor Author

delner commented Jan 4, 2018

Hey @Willianvdv, I took a quick look over at this sample, and it looks really promising! (Love the screenshots!) I think it could improve on a number of things; the resource names are a bit more specific, and the query string is present... both things this tracing module could benefit from. Is it possible to put these changes into another PR? I'd like to give it a proper review in its own right, since there's a lot going on there worth looking at.

I think I'd like to focus this PR just on the bug for now though, if possible, and the minimal set of changes to get things working again. Then we could come back and work these other improvements in?

@Willianvdv
Copy link
Contributor

@delner totally! You're right about the overload of data on the spans and that the logging is very specific to our app. I'll create a new PR add you as a reviewer, would be valuable to have multiple parties working on DataDog tracing!

@delner
Copy link
Contributor Author

delner commented Jan 4, 2018

@Willianvdv Awesome! Looking foward to it! 👍

@delner
Copy link
Contributor Author

delner commented Jan 8, 2018

Hey @rmosolgo @esjee @Willianvdv , I'm feeling pretty good about how these changes are looking. Have any 👀 to spare for this one?

Copy link
Contributor

@esjee esjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems good to me, but I didn't verify the behavior.

@@ -15,28 +15,32 @@ class DataDogTracing < PlatformTracing
}

def platform_trace(platform_key, key, data)
service = options.fetch(:service, 'ruby-graphql')
tracer.trace(platform_key, service: service_name) do |span|
span.span_type = Datadog::Ext::HTTP::TYPE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best fit? 🤔

Looking at dd-trace-rb, they seem to use it for:

  • HTTP requests
  • controllers
  • grape

Do you think grape is roughly on the same level as GraphQL? If so, then the Ext::HTTP::TYPE could be a good fit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good point. GraphQL isn't explicitly HTTP, I suppose. So maybe it isn't a best fit.

How would you categorize this set of GraphQL operations, if you were to choose your own word for it? My thoughts are that it kind of falls in a "generic Ruby operation" category, but I don't think there's a suitable span type constant defined in dd-trace-rb right now for this.

It should be okay to just give it a value of custom in lieu of a better alternative right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with you -- can't really find a good fit for it.

if key == 'execute_multiplex'
span.resource = data[:multiplex].queries.map(&:selected_operation_name).join(', ')
operations = data[:multiplex].queries.map(&:selected_operation_name).join(', ')
span.resource = operations if operations.length > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to do this .length operation after the join? I realise it doesn't really matter either way, but just be aware that you're counting characters here 😄

Also, consider using the (perhaps more idiomatic?) .empty? method instead of .length > 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like empty? better!

@delner
Copy link
Contributor Author

delner commented Jan 10, 2018

@esjee Made some changes to address the feedback!

@esjee
Copy link
Contributor

esjee commented Jan 18, 2018

Except for Travis, this PR looks good to me. @rmosolgo @Willianvdv any thoughts?

1 similar comment
@esjee
Copy link
Contributor

esjee commented Jan 18, 2018

Except for Travis, this PR looks good to me. @rmosolgo @Willianvdv any thoughts?

@delner
Copy link
Contributor Author

delner commented Jan 18, 2018

There was some kind of CI problem with Postgres (new version 1.0.0 that released?) I updated this PR against the latest master, which seemed to fix the build.

Copy link
Owner

@rmosolgo rmosolgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix here, sorry I've been slow to respond, I just have one more question!

if key == 'execute_multiplex'
span.resource = data[:multiplex].queries.map(&:selected_operation_name).join(', ')
operations = data[:multiplex].queries.map(&:selected_operation_name).join(', ')
span.resource = operations if operations.length.empty?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is operations.length.empty?? Should it be .zero??

# $ irb
irb(main):001:0> 0.empty?
# NoMethodError: undefined method `empty?' for 0:Fixnum
# 	from (irb):1
# 	from /usr/bin/irb:12:in `<main>'
irb(main):002:0> 0.zero?
# => true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man... I must have fat fingered that one. I meant operations.empty? because operations should be a String. I'll update this.

@delner
Copy link
Contributor Author

delner commented Jan 18, 2018

Good catch, sorry about that @rmosolgo! Should be fixed now.

@delner
Copy link
Contributor Author

delner commented Jan 24, 2018

Hey @rmosolgo , just wanted to follow up with you on this one. I think it's ready to merge!

@rmosolgo rmosolgo merged commit 44e6e15 into rmosolgo:master Jan 25, 2018
@rmosolgo
Copy link
Owner

Great, thanks for your work on it! 👏 !

@rmosolgo rmosolgo added this to the 1.7.9 milestone Jan 25, 2018
@rmosolgo rmosolgo added this to the 1.7.9 milestone Feb 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants