Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make racecar traces use the default service name #2776

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dasch
Copy link
Contributor

@dasch dasch commented Apr 12, 2023

Right now, if no service_name is specified for a racecar trace, the default will not come from the DD_SERVICE env var, but will rather be the static string "racecar". This seems like a poor default, assuming an organization runs more than one consumer.

What does this PR do?
It removes the override for the service_name option, allowing the default behavior to take place.

Motivation
We want to configure our Datadog service names with Kubernetes annotations, but some integrations do not respect those, and we have to force the use of the env var on a case by case basis.

How to test the change?
Test that it uses the env var DD_SERVICE rather than just "racecar".

Right now, if no `service_name` is specified for a `racecar` trace, the default will not come from the `DD_SERVICE` env var, but will rather be the static string `"racecar"`. This seems like a poor default, assuming an organization runs more than one consumer.
@dasch dasch requested a review from a team April 12, 2023 11:32
@github-actions github-actions bot added integrations Involves tracing integrations tracing labels Apr 12, 2023
@delner
Copy link
Contributor

delner commented Apr 13, 2023

Hey @dasch ! Happy to see you again in the repo. :)

I think this is a great change; it's moving in the general direction of where we want to go with service naming: where instrumentation reflects DD_SERVICE when measuring operations internal to the Ruby application.

Only consideration is that it could be potentially considered a breaking change. If users depend upon the default (for some reason), then changing the default may affect their services/service map/monitors unexpectedly. We generally reserve breaking changes for major version releases.

Is there some kind of intermediate step we can take here?

@dasch
Copy link
Contributor Author

dasch commented Apr 13, 2023

Good to be back! 😄

I'm not sure if there's an intermediary, non-breaking change – you could push this to the next breaking change if you'd like. Perhaps do a search for all cases where service_name gets defaulted on an integration level and do all of them in one go?

@delner
Copy link
Contributor

delner commented Apr 18, 2023

Perhaps do a search for all cases where service_name gets defaulted on an integration level and do all of them in one go?

@dasch that's exactly where we're leaning. We'd like all spans in an application to have service=DD_SERVICE, and add a peer.service tag to any spans that do "external" operations (e.g. DB queries, HTTP calls).

The next major version would be 2.0. We don't have plans to release this yet, but it's possible it'd be late 2023.

This is something we'll have to lay out a migration path for. If you have any ideas for helpful intermediate steps we can add now (deprecation warnings, opt-in flags, etc) that can help us introduce this behavior, we'd be happy to consider those. Ideally, a user could utilize such features to fully migrate their code/services in 1.x before making the upgrade to 2.0.

@delner
Copy link
Contributor

delner commented Apr 18, 2023

Also, slightly off topic, but just to pick your brain @dasch...

Users often (mis)use the service_name on these integrations to make distinct layers (middleware, controllers, presentation) visible within their application. This way they get a coloration on the trace graph, and a time spent % table on their traces.

Do you employ this practice at all?

The downside here is that it mixes up applications with layers of applications, which can make a mess of the service map. We'd like users to have both a nice service map and have introspection into the layers of their application. The latter is the tricky part. If you could envision what it would be like to label and visualize these application layers in Datadog, what would that process look like?

We're looking for some inspiration from user cases here, if you have any to offer :)

@dasch
Copy link
Contributor Author

dasch commented Apr 18, 2023

Sounds great! I'm happy to wait until late 2023.

Also, slightly off topic, but just to pick your brain @dasch...

Users often (mis)use the service_name on these integrations to make distinct layers (middleware, controllers, presentation) visible within their application. This way they get a coloration on the trace graph, and a time spent % table on their traces.

Do you employ this practice at all?

Guilty as charged 😆 part of this is path dependence: we adopted APM when it was brand new, and a lot of the later features, like the dependency map, didn't exist back then. Crucially, I don't think there was any UI for selecting which operation to use for the resource breakdown, and we wanted exactly what you mention: distinct views for different layers/aspects of our application, such as errors rates and latency by controller/view/middleware/etc.

The downside here is that it mixes up applications with layers of applications, which can make a mess of the service map. We'd like users to have both a nice service map and have introspection into the layers of their application. The latter is the tricky part. If you could envision what it would be like to label and visualize these application layers in Datadog, what would that process look like?

We're looking for some inspiration from user cases here, if you have any to offer :)

I think improved UX around switching between operations in the APM views would help here, e.g. a way to "zoom out" of a specific operation and view an overview of operations, then zoom back in to a specific operation. Still keep the "primary operation" concept, but make multi-operation services easier to navigate.

It would also help to have a multi-tiered visualization in trace flamegraphs – one primary color per service, but some variation per operation name.

@delner
Copy link
Contributor

delner commented Apr 18, 2023

Guilty as charged 😆

@dasch haha, you're not the only one! This is fairly common, so I figured there was decent odds. It's okay for now, especially as we don't have a great alternative to offer. Which is why I'm asking... I want to get a sense from some of our users what an alternative could look like.

Operation name is an interesting thought. It could work; I do want to point out operation names can often be distinct in a trace. If the colorations/groupings reflected such uniqueness, would that hurt trace graph visualization?

My gut says if there are too few groups then not enough introspection, if there are too many groups then too much noise and a lack of a proper "roll up" in stats.... some kind of balance needs to be struck. One of my thoughts was to do this grouping around component, a tag we use that roughly means "name of the library in use". Would this be useful? Or is there a different "key" that would be a more effective GROUP BY?

@dasch
Copy link
Contributor Author

dasch commented Apr 18, 2023

Yeah, I think component would make sense. Is that something that's being leveraged today in the UI?

@delner
Copy link
Contributor

delner commented Apr 19, 2023

It's available on most of our integrations, but not actively used to create coloration/groupings. Seems like a good candidate. Are there any other such tags that you think would also be particularly useful to group on?

@dasch
Copy link
Contributor Author

dasch commented Apr 19, 2023

Not that I can think of. What's component used for currently, btw?

@dasch
Copy link
Contributor Author

dasch commented Apr 19, 2023

As in, how is it used in the UI?

@delner
Copy link
Contributor

delner commented Apr 20, 2023

It's currently not really "in use" other than that its populated with the library name of the integration that produced it. This value is a constant e.g. active_record). It's a tag we introduced in our out-of-the-box Ruby integrations; I wouldn't expect to find it on all instrumentation from all languages.

I suppose the question is more: "Is this conceptual descriptor useful to group on? Are there other conceptually useful descriptors that may not exist that are also useful to group on?"

If that seems like the most obvious one, and no others, that's okay too.

@nightpool
Copy link
Contributor

The downside here is that it mixes up applications with layers of applications, which can make a mess of the service map. We'd like users to have both a nice service map and have introspection into the layers of their application. The latter is the tricky part. If you could envision what it would be like to label and visualize these application layers in Datadog, what would that process look like?

This is the biggest part for me that prevents me from enabling this today for our current integrations. The UI just doesn't support surfacing "sub-service" values in a very good way if at all. Ideally everything you can do to a "service" today in the UI (group by, sort by, get a table for) you should be able to do to a "component" or similar.

@dasch
Copy link
Contributor Author

dasch commented Apr 26, 2023

@delner OK, I've thought of another use case for "components" – we have a handful of extremely large monoliths, and those have split ownership, on-call rotations, etc. We use Github CODEOWNERS, for instance. It would make a lot of sense to be able to have the Service Catalog have separate, nested entries for components within a "service". Basically, allow multiple entries per dd-service, but require (dd-service, component) to be unique. Then we could map ownership within services in a flexible way. If component is already tied to e.g. specific framework/library boundaries, then some other tag could be used. We're moving towards centering ownership and policy around "features", for example.

@delner
Copy link
Contributor

delner commented May 16, 2023

@nightpool Agreed on providing this additional visibility; it needs more granularity without use of service. Thanks for raising this point!

@dasch

OK, I've thought of another use case for "components" – we have a handful of extremely large monoliths, and those have split ownership, on-call rotations, etc. We use Github CODEOWNERS, for instance. It would make a lot of sense to be able to have the Service Catalog have separate, nested entries for components within a "service".

Your timing on this example is great; I've been working with the rest of our team to figure out how to best address monoliths. The context you provided here is very helpful for that conversation.

Currently, one of the lines of thought was focused around Rails engines, and modeling these as their own "service" even if mounted within another Rails application. When you describe "features" however, this sounds a bit "softer"; less of a sub-application, more of a module. Maybe these shouldn't be "services" themselves but as you said, still require some visibility/grouping to distinguish them from the rest of the service to which they belong.

As far as tag naming goes, my gut says component should be renamed to package to cement it very specifically as "the name of the package from which it was sourced." Then we should reintroduce component as a module within a service. The convention for this field is a big question mark... should it contain something prescriptive like a namespace? Or should it be entirely user-defined like a service? (I lean towards the latter.)

Your thoughts are welcome.

@dasch
Copy link
Contributor Author

dasch commented Jun 6, 2023

I would prefer flexibility for "component". It's less important for there to be a standard about what it means than it is for it to support the use cases that are valuable – specifically, being able to associate parts of applications with specific owners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrations Involves tracing integrations tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants