Skip to content

runtime metrics config gets lost when writer is rebuilt #967

Closed

Description

I've noticed a small runtime metrics issue. I am including a simple reproduction below and also a draft PR with a patch for this, just to demonstrate one way to resolve it.

The issue: In the Datadog.configure block, when configuring runtime_metrics, if a call to configure the tracer host or port (c.tracer port: ...) occurs after the call to c.runtime_metrics, any runtime metrics configuration will get lost and the trace writer will use the default statsd client and configuration. What it appears is happening is that all that runtime metrics configuration is getting attached to the writer itself but, for reasons i'm not entirely clear on but appear to be pretty complex and an active area of development right now, that writer gets dropped and a new one gets built/rebuilt when c.tracer is called with certain configuration options.

There's a few ways users can be impacted by this, because it's not documented anywhere that the ordering of c.runtime_metrics within the configure block matters. One way is that if a user passes in their own statsd client with their own namespace, that namespace will get lost when the writer is dropped from c.tracer kicking off a rebuild. Another is that if a user is passing in their own statsd client c.runtime_metrics statsd: <statsd_client_used_elsewhere_in_app> , then if that statsd client needs to point to a port different than 8125 or non-local host, then that config will get lost and runtime metrics as a feature won't work. Or, it'll technically work just fine, but just fire a bunch of udp packets off into space to a host/port that isn't listening for them. Needless to say since it's UDP it was quite difficult to debug this behavior.

In the PR I'm making i've included an, admittedly, pretty hacky patch. It just shoves the existing runtime metrics config into the writer_options when rebuilding the writer, if that writer's options don't contain any runtime metrics specific options already. I think it works fine though it could use a unit test, but I also realize that y'all are already working on areas that touch this, so didn't want to go too far down the rabbit hole here if this will just sort've solve itself from the internal refactoring/simplification of the trace writer. It's also possible there's a cleaner way to patch this, what I attached was just the easiest thing I could get working.

An alternative (or short term remediation step) to this would be to simply document the constraints of runtime_metrics configuration at present, which is c.runtime_metrics needs to be called last in the Datadog.configure block, but ideally the ordering in the configure block shouldn't really matter and it would appear this isn't intentional.

I've included a little sample reproduction minitest suite here: https://gist.github.com/ericmustin/1d1c228053078923e7f9b19c0add8ce7

Draft PR here: #968

Sorry for the opposite of succinct writing, thanks for taking the time, cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

bugInvolves a bugcoreInvolves Datadog core libraries

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions