-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing resource names for DataDog tracing #1208
Missing resource names for DataDog tracing #1208
Conversation
Thanks for the great writeup on this issue! It's been ... interesting ... to learn the ins and outs of the different platforms :P 👍 from me, @esjee do you have any thoughts on this change? |
As @esjee is enjoying a well deserved holiday, I'll answer this question. For us, the most important feature is to see the performance of one query over time. Something that was surprisingly hard to achieve and required a lot of trial-and-error-driven development. I think the main cause is that APM isn't created for the use case that one controller receives various types of data requests (graphql queries). Anyway, this is what we got now: class DatadogTracing < GraphQL::Tracing::PlatformTracing
# NOTE: do not modify these constants otherwise we'll lose history in Datadog
SERVICE_NAME = 'core-graphql'
PARENT_SPAN_NAME = 'execute.graphql'
self.platform_keys = {
'lex' => 'lex.graphql',
'parse' => 'parse.graphql',
'validate' => 'validate.graphql',
'analyze_query' => 'analyze.graphql',
'analyze_multiplex' => 'analyze_multiplex.graphql',
'execute_multiplex' => PARENT_SPAN_NAME,
'execute_query' => 'execute_query.graphql',
'execute_query_lazy' => 'execute_query_lazy.graphql',
'execute_field' => 'execute.field',
'execute_field_lazy' => 'execute.field.lazy',
}
def platform_trace(platform_key, key, data)
translated_key_name = @platform_keys.fetch(key)
Datadog.tracer.trace(translated_key_name, service: SERVICE_NAME) do |span|
query = extract_query(key, data)
query_string = query.present? && query.query_string || data.fetch(:query_string)
span.span_type = 'graphql'
span.set_tag('platform_key', platform_key)
span.set_tag('graphql.query.query_string', query_string)
if query.nil? || query.document.nil? || query.selected_operation.nil?
elsif updated_untagged_spans?(query)
update_span(span, query)
else
parent_span = parent_span_for_span(span)
update_untagged_spans(parent_span, query)
query_context(query)[:updated_untagged_spans] = true
end
yield
end
end
def update_span(span, query)
operation_name = query.selected_operation.name || '<anonymous>'
operation_type = query.selected_operation.operation_type
current_user_id = query.context.fetch(:current_user)&.id
cloudflare_id = query.context.fetch(:request).cookies[CLOUDFLARE_ID]
resource = operation_name
span.resource = resource
span.set_tag('graphql.query.name', resource)
span.set_tag('graphql.query.operation_type', operation_type)
span.set_tag('cfduid', cloudflare_id)
span.set_tag('current_user_id', current_user_id)
end
def parent_span_for_span(span)
return span if span.name == PARENT_SPAN_NAME && span.service == SERVICE_NAME
return nil if span.parent.nil?
parent_span(span.parent)
end
def query_context(query)
query.context.namespace(DatadogTracing)
end
def update_untagged_spans(span, query)
return unless span.get_tag('graphql.query.name').nil?
update_span(span, query)
span.children.each do |child_span|
update_untagged_spans(child_span, query)
end
end
def updated_untagged_spans?(query)
query_context(query)[:updated_untagged_spans]
end
def platform_field_key(type, field)
"#{type.name}.#{field.name}"
end
private
def instrument_scalar_field(type, field)
instrument_nonscalar_field(type, field)
end
def extract_query(key, data)
# TODO: how to handle multiple queries, with multiplex?
if key.in?(['execute_multiplex', 'analyze_multiplex'])
data.fetch(:multiplex).queries.first
elsif key.in?(['execute_field', 'execute_field_lazy'])
data.fetch(:context).query
elsif key.in?(['lex', 'parse'])
nil
else
data.fetch(:query)
end
end
end In Datadog this gives us a nice overview of GraphQL queries: An important part of this setup is that the root span is being used as the total run time of the GraphQL query, and not some other random span.
Unfortunately, we don't have the scalar names on the spans. We just couldn't get reliable tracing information and have fancy names, so we opted out for the names for now. I think it's a limitation of DD and it would be nice if a span would accept a display name to render the human-readable name. But that's up to DD to implement.
@delner I'm not sure if this contains any useful information for you. Feel free to ignore or copy the implementation in your PR. |
Hey @Willianvdv, I took a quick look over at this sample, and it looks really promising! (Love the screenshots!) I think it could improve on a number of things; the resource names are a bit more specific, and the query string is present... both things this tracing module could benefit from. Is it possible to put these changes into another PR? I'd like to give it a proper review in its own right, since there's a lot going on there worth looking at. I think I'd like to focus this PR just on the bug for now though, if possible, and the minimal set of changes to get things working again. Then we could come back and work these other improvements in? |
@delner totally! You're right about the overload of data on the spans and that the logging is very specific to our app. I'll create a new PR add you as a reviewer, would be valuable to have multiple parties working on DataDog tracing! |
@Willianvdv Awesome! Looking foward to it! 👍 |
Hey @rmosolgo @esjee @Willianvdv , I'm feeling pretty good about how these changes are looking. Have any 👀 to spare for this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code seems good to me, but I didn't verify the behavior.
@@ -15,28 +15,32 @@ class DataDogTracing < PlatformTracing | |||
} | |||
|
|||
def platform_trace(platform_key, key, data) | |||
service = options.fetch(:service, 'ruby-graphql') | |||
tracer.trace(platform_key, service: service_name) do |span| | |||
span.span_type = Datadog::Ext::HTTP::TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best fit? 🤔
Looking at dd-trace-rb, they seem to use it for:
- HTTP requests
- controllers
- grape
Do you think grape is roughly on the same level as GraphQL? If so, then the Ext::HTTP::TYPE could be a good fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good point. GraphQL isn't explicitly HTTP, I suppose. So maybe it isn't a best fit.
How would you categorize this set of GraphQL operations, if you were to choose your own word for it? My thoughts are that it kind of falls in a "generic Ruby operation" category, but I don't think there's a suitable span type constant defined in dd-trace-rb
right now for this.
It should be okay to just give it a value of custom
in lieu of a better alternative right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with you -- can't really find a good fit for it.
if key == 'execute_multiplex' | ||
span.resource = data[:multiplex].queries.map(&:selected_operation_name).join(', ') | ||
operations = data[:multiplex].queries.map(&:selected_operation_name).join(', ') | ||
span.resource = operations if operations.length > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want to do this .length
operation after the join
? I realise it doesn't really matter either way, but just be aware that you're counting characters here 😄
Also, consider using the (perhaps more idiomatic?) .empty?
method instead of .length > 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like empty?
better!
@esjee Made some changes to address the feedback! |
Except for Travis, this PR looks good to me. @rmosolgo @Willianvdv any thoughts? |
1 similar comment
Except for Travis, this PR looks good to me. @rmosolgo @Willianvdv any thoughts? |
There was some kind of CI problem with Postgres (new version 1.0.0 that released?) I updated this PR against the latest master, which seemed to fix the build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix here, sorry I've been slow to respond, I just have one more question!
if key == 'execute_multiplex' | ||
span.resource = data[:multiplex].queries.map(&:selected_operation_name).join(', ') | ||
operations = data[:multiplex].queries.map(&:selected_operation_name).join(', ') | ||
span.resource = operations if operations.length.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is operations.length.empty?
? Should it be .zero?
?
# $ irb
irb(main):001:0> 0.empty?
# NoMethodError: undefined method `empty?' for 0:Fixnum
# from (irb):1
# from /usr/bin/irb:12:in `<main>'
irb(main):002:0> 0.zero?
# => true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh man... I must have fat fingered that one. I meant operations.empty?
because operations
should be a String. I'll update this.
Good catch, sorry about that @rmosolgo! Should be fixed now. |
Hey @rmosolgo , just wanted to follow up with you on this one. I think it's ready to merge! |
Great, thanks for your work on it! 👏 ! |
First off, wanted to say thanks for adding DataDog tracing support! Really cool to see support for this in GraphQL.
I was trying to setup and test out
graphql-ruby
with DataDog tracing enabled, but ran into some issues with the traces, which were being generated incorrectly, causing them not to display in DataDog.When the
GraphQL::Tracing::DataDogTracing
receives anexecute_multiplex
operation to trace, and none of its queries haveselected_operation_name
s, it sets the resource name to an empty string. Although this doesn't raise an error in the Ruby application, when this trace gets sent through the DataDog agent, the agent prunes out 'bad looking' traces, including ones without resource names. Thus, the trace never makes it to DataDog.The issue should be reproducible with default boilerplate code, and the following:
dd_rails_test_schema.rb
types/query_type.rb
types/foo_type.rb
Then assuming a
Foo
record with ID 1 exists, running a query with:The workaround for this issue, without the changes included in this PR, involve adding the following to the
datadog.rb
initializer, which leverages the pipeline feature in the DataDog tracing library to modify the incorrectly generated GraphQL traces:This pull request sets a span type (for better presentation in DataDog), and sets a custom resource name for
execute_multiplex
operations only if the sub queries haveselected_operation_name
s. Otherwise it uses the span default, which is the span name (e.g.execute.graphql
in this case.) Open to suggestions if we think there might be a better default name to use in this particular case!This issue is reproducible on
master
at least still as of2782cc6343280fb68b1184b05aa18951e1d820e6
, with the latest versions of Rails, Ruby, and theddtrace
gem.Let me know if there's anything else I can do on my end to help with this issue!