-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to specify service names by domain #882
Conversation
ccf8fa5
to
deb8a69
Compare
Thanks for the PR @stormsilver! I've been wanting to do something like this for a while, so I'm really excited to see this. Couple of quick thoughts:
I think is definitely something we'd like to merge though those things considered. |
So I took a crack at implementing this via resolver pattern instead, and came up with this. You would use it like: Datadog.configure do |c|
c.use :faraday, describe: /example\.com/ do |faraday|
faraday.service_name = 'bar'
faraday.split_by_domain = false # Only necessary if split_by_domain is true by default
end
end Tests pass so it seems to work. If we would want to incorporate this into the PR, some other things we'd have to do is:
I didn't have permissions to push to your forked repo, so I'm sharing the diff here: Feedback welcome! |
I applied you patch locally and spent some time spelunking the code to see how the resolvers work. This makes sense to me.
Implementing this on all 4 of the HTTP libraries will require some solid refactoring of ethon and http. It makes the most sense to me to begin extracting stuff out into an HTTP tracer helper module.
I will work on this in my spare time. It’ll be a bit before it’s done.
…-- Eric
On Dec 11, 2019, at 19:29, David Elner ***@***.***> wrote:
So I took a crack at implementing this via resolver pattern instead, and came up with this. You would use it like:
Datadog.configure do |c|
c.use :faraday, describe: /example\.com/ do |faraday|
faraday.service_name = 'bar'
faraday.split_by_domain = false # Only necessary if split_by_domain is true by default
end
end
Tests pass so it seems to work. If we would want to incorporate this into the PR, some other things we'd have to do is:
Consider changing how the RegexpResolver has keys to it. Right now if you call it ever with an Regexp, it will cache that Regexp (e.g. Datadog.configuration[:faraday, /some exp/]). There might be a better way to be explicit about adding vs accessing that's safer.
Add unit tests for it.
Implement it for the other HTTP libraries.
Update the documentation.
I didn't have permissions to push to your forked repo, so I'm sharing the diff here:
regexp_resolver.txt
Feedback welcome!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
No worries, let me know if there's anything I can be of help with, particularly as it pertains to the resolver/configuration pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this has been updated with your comments. Please let me know if I forgot something. In particular I dunno if the specs are up to snuff.
lib/ddtrace/contrib/configurable.rb
Outdated
def configure(key, options = {}, &block) | ||
key = resolver.resolve(key || :default) | ||
def configure(key = :default, options = {}, &block) | ||
resolver.add_key(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how I ended up solving the awkward dance that RegexpResolver
was doing to remember patterns.
@@ -23,14 +27,13 @@ def http_request(url, action_name, options = {}) | |||
end | |||
|
|||
def headers=(headers) | |||
return super unless tracer_enabled? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to remove the tracer_enabled?
call here and in reset
because, since we added the resolver pattern, you now have to have a hostname in order to determine the configuration (to decide if the tracer is enabled. theoretically you can now disable the tracer for requests matching /do-not-want.io/
)... but you can set headers before supplying a URL to net/http.
This will have the side effect of creating a few DD instance variables on the net/http object even if the tracer is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand entirely; would it make sense to derive enabled?
from the default config if the hostname isn't available? It might mean that not all settings are strictly respected per host, but maybe that's okay? Thoughts are most welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that it's better to respect all settings per host and leave a few extra ivars on the request object rather than have a surprising situation where you disabled the tracer for certain hosts but they are still getting traced (which would appear to the casual observer to be a bug).
I can certainly derive enabled?
from the default config if there's no hostname if you feel like the tradeoff isn't worth it.
service = Datadog.configuration[:http][:service_name] | ||
tracer = Datadog.configuration[:http][:tracer] | ||
service = config[:service_name] | ||
tracer = config[:tracer] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a subtle bug here in that someone could call datadog_pin
on this object with no args before running a request. It would then use the :default
configuration to supply args to the pin. If the request is then run for a hostname which matches a regex configuration, we'll use the wrong configuration since it's been memoized.
I didn't have enough context to know if this is actually a problem, or what people use custom service names on pins for, or if the pin should just be removed since they look deprecated, or ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this could be a problem, given that someone might build an HTTP object, set something on its datadog_pin
then call it with a request resulting in this scenario.
I think that's one of the tricky things about this "by hostname" configuration: we have 1) default, 2) "by host" and 3) by object configuration which all has to be merged together to apply the settings that make the most sense for usability. Inevitably I think this resolution has to happen at request time (less than ideal, but unavoidable.)
8723354
to
e389f23
Compare
Well, the CI tests failed which led me to realize that I had put the ethon tests in the wrong spot and they weren't even running except on CI (I thought I had run them) which led me to realize that I hadn't even implemented the service_name change for ethon, which is really embarrassing. So these failures are quite legit. |
Haha, that's quite already @stormsilver. I'll try to take a look at your changes see if I can give any helpful feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far looking pretty good. Just left a few comments to consider, but I'm happy with the direction this is heading in.
@@ -3,6 +3,10 @@ module Contrib | |||
module Configuration | |||
# Resolves a value to a configuration key | |||
class Resolver | |||
def add_key(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a second method here was inevitable, so I'm on board with this.
# Matches strings against Regexps. | ||
class RegexpResolver < Datadog::Contrib::Configuration::Resolver | ||
def add_key(pattern) | ||
patterns << pattern.is_a?(Regexp) ? pattern : /#{pattern}/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does pattern
in /#{pattern}/
need to be made regex safe? I thought there was some Regexp parsing you could apply, but maybe that's only relevant to literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. Regexp.quote
is what you're talking about, I think: https://ruby-doc.org/core-2.7.0/Regexp.html#method-c-quote
@@ -23,14 +27,13 @@ def http_request(url, action_name, options = {}) | |||
end | |||
|
|||
def headers=(headers) | |||
return super unless tracer_enabled? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand entirely; would it make sense to derive enabled?
from the default config if the hostname isn't available? It might mean that not all settings are strictly respected per host, but maybe that's okay? Thoughts are most welcome.
@@ -26,6 +27,10 @@ def default_configuration | |||
def patcher | |||
Patcher | |||
end | |||
|
|||
def resolver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important right now, but maybe we could extract this behavior to some kind of HTTP
module that can be recomposed into each integration that uses it. I don't think we have to do that right now though, as we probably should wait to get more behavior worth extracting first.
service = Datadog.configuration[:http][:service_name] | ||
tracer = Datadog.configuration[:http][:tracer] | ||
service = config[:service_name] | ||
tracer = config[:tracer] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this could be a problem, given that someone might build an HTTP object, set something on its datadog_pin
then call it with a request resulting in this scenario.
I think that's one of the tricky things about this "by hostname" configuration: we have 1) default, 2) "by host" and 3) by object configuration which all has to be merged together to apply the settings that make the most sense for usability. Inevitably I think this resolution has to happen at request time (less than ideal, but unavoidable.)
pin = datadog_pin | ||
host, = host_and_port(req) | ||
request_options = datadog_configuration(host) | ||
pin = datadog_pin(request_options) | ||
return super(req, body, &block) unless pin && pin.tracer | ||
|
||
if Datadog::Contrib::HTTP.should_skip_tracing?(req, @address, @port, pin.tracer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that's really cool about this resolver is we might be able to use it to remove this circuit breaker (which exists to prevent our tracer from tracing itself when it writes traces over HTTP to the agent.) Alternatively, we could probably use enabled
as well to effect the same result.
Probably should be addressed in another PR though.
514d14c
to
997d651
Compare
This allows all integrations that trace HTTP requests to: - use the domain name of the request as the service name - specify a service name for requests matching a regex
997d651
to
4cc04a0
Compare
Okay @delner I think this is ready for another review. |
Awesome, I'll give it another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shaping up really nicely now @stormsilver! I think we're really close now. Just a few small suggestions for some tweaks.
Documentations changes are good; are there any other tests we should add to cover the new behavior?
# Matches strings against Regexps. | ||
class RegexpResolver < Datadog::Contrib::Configuration::Resolver | ||
def add_key(pattern) | ||
patterns << pattern.is_a?(Regexp) ? pattern : /#{Regexp.quote(pattern)}/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although its kind of nice to convert everything to a Regex for consistency I think there's a couple of problems with interpolating strings to Regexp
:
- String values might match inadvertently, e.g. passing
'a'
would match'a'
and'aa'
. - There's an added expense to evaluating
Regexp
when doing==
with a string might suffice.
I would recommend accepting any object, then in #resolve
do a case statement such that if the pattern is a Regexp
, do =~
, otherwise coerce the pattern to a string and do ==
.
return :default if name == :default | ||
|
||
name = name.to_s | ||
matching_pattern = patterns.find { |p| p =~ name } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this isn't a big deal, but should we change name
to name.to_s
? Coercion might prevent some type incompatibility issues such as with URI
, etc.
module Datadog | ||
module Contrib | ||
# Contains methods helpful for tracing/annotating HTTP request libraries | ||
module HttpAnnotationHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extract this?
Although it does reduce some repetition, it might be a bit premature given its this single line function. I might favor the repetition over having the additional constant in this case; the use of options specific to the integration (such as service_name
and split_by_domain
I think reinforces this right now, too. Eventually if the behavior grows in scope, an extraction like this might be appropriate.
To support a similar "multiplexing" scenario where we would benefit from the resolver changes, I've extracted those changes to #940 and merged them to |
Hey @stormsilver , looks like this PR has grown stale, but we still wanted to see these changes incorporated. Given your repo doesn't grant push permissions to upstream repo members, I opened #953 which replaces this PR to rebase your commits and apply a few changes of my own. We'll continue working on this feature in that PR; please feel free to review/contribute to that as you wish. Thanks again for your contributions so far! |
What?
When using the
faraday
andexcon
contribs with thesplit_by_domain
feature turned on, allow users to specify an additional hash that can override the service name by domain.Why?
We use an external service that makes a per-customer domain name. When we create HTTP requests to this service for our customers, it looks like
user-123.foo.com
anduser-abc.foo.com
(and so on). When we split by domain (which we want to do for all the other HTTP services we use), it ends up creating hundreds of separate "services" on the service map - all of which are really the same external service. We'd like to have these all show up as one service.How?
By providing a hash in the form of
{/regex/ => 'desired_service_name'}
, we can match the desired domain names (/user-[^.]+\.foo\.com/
) and provide the service name we want to see instead (user.foo.com
)