Skip to content

feat: service.target.* to improve backend granularity #2882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Sep 14, 2022

Conversation

trentm
Copy link
Member

@trentm trentm commented Aug 17, 2022

This adds the new span.context.service.target.{type,name} fields for improved granularity on backend services data in exit spans. This data is used for "Service Maps" and "Dependencies" in the Kibana APM app.

All instrumentations have been updated to set appropriate service target values. service.target.* is typically inferred automatically from other span data, so much of the instrumentation work was in adding other span fields, most commonly span.context.db.instance. The one current exception to the general inference algorithm is S3 spans, for which the spec'd service.target.name doesn't follow the general pattern.

A new, public span.setServiceTarget(type, name) API has been added (this is a "SHOULD" in the spec).

The new fields will replace the (now deprecated) span.context.destination.service.* fields. In the current stage of transition:

  • destination.service.{type,name} are set to the empty string. They are no longer used, but the intake API v2 schema up to 7.13 require them to be set.
  • Setting destination.service.resource directly is discouraged. Typically it is inferred from service.target.* values when a span is ended. Again the one exception is S3 spans.

The not-public-but-available-because-JavaScript span.setDestinationContext has been deprecated (using it will process.emitWarning()) and replaced with an internal span._setDestinationContext().

As part of this change, improvements have been made to some module instrumentations:

  • redis and ioredis: span.type has changed from "cache" to "db" per spec (https://github.com/elastic/apm/blob/main/specs/agents/tracing-instrumentation-db.md#redis)
  • mongodb: span.action used to be "query", now it will be the mongodb command name, e.g. "find", "insert".
  • mongodb and mongodb-core: span.db.instance is now set to the database name (mongo needs span.db.instance support #1494)
  • mysql and mysql2: span.db.{instance,user} are now populated.
  • @elastic/elasticsearch: The cluster name is heuristically determined for Elastic Cloud deployments and used for db.instance.
  • sqs: span.destination.{address,port} are now populated.
  • pg: span.db.{instance,user} are now populated.
  • cassandra-driver: the Cassandra keyspace is captured for service target data, if available.
  • OpenTelemetry Bridge: OTel spans with kind PRODUCER and CLIENT are now handled as exit spans (e.g. span compression could apply).

Closes: #2621
Closes: #2822
Closes: #1494
Closes: #1897
Closes: #2103
Obsoletes: #2458

Notes for reviewers

This changes a lot of files. The following might help break that down to reasonable chunks:

  • The main part of the change is the general algorithm to set span.context.service.target.* in Span.end(), and then the algorithm to infer span.context.destination.service.resource from the service.target value, also in Span.end(). This is all in "lib/instrumentation/span.js".
  • Those two algorithms are being clarified in this spec PR: Backend granularity (service.target.*) clarifications apm#674 Mostly that PR is putting pseudo-code in the specs to match what is implemented in the Java agent.
  • Changes to specific instrumentations are listed above, but most of the changes are:
    • Because destination.service.resource is inferred automatically, instrumentations for all exit spans changed to no longer specify destination.service in the call to span.setDestinationContext().
    • Also, the algo to set service.target.* uses context.db.instance if available, so many of the instrumentations were updated to determine db.instance according to spec.
  • "lib/instrumentation/span-compression.js" and "lib/instrumentation/dropped-span-stats.js" were updated per spec to use service.target.* fields.
  • The tests. Most test file updates are just updating service.target.* and destination.service.* expectations. "Interesting" test files are:
    • "elasticsearch.test.js" testing the handling of the 'x-found-handling-cluster' header for our partial implementation of https://github.com/elastic/apm/blob/main/specs/agents/tracing-instrumentation-db.md#cluster-name
    • "test/instrumentation/modules/cassandra-driver/_utils.js" improves the Cassandra test utils to separate "keyspace" and "table" for more cleaning testing those. This is more important now that keyspace is used for db.instance if available.
    • "test/opentelemetry-bridge/otel-bridge-feature.test.js" is a painstaking manual update to the latest "otel_bridge.feature" gherkin spec from apm.git that we cannot use directly
    • "test/service-resource-inference.test.js" is new and adds testing of the shared "json-specs/service_resource_inference.json" from apm.git

This adds support for the new "span.context.service.target.*" fields,
and updates handling of the now deprecated
"span.context.destination.service.*" fields to be calculated
automatically.

Closes: #2621

----

Changes in this first commit:

- Add the initial algorithm for setting `span.context.service.target.*` for exit spans, and for inferring `span.context.destination.service.*` as well. Still discussing some details/clarifications at elastic/apm#674
- Add a public `Span#setServiceTarget(type, name)`. (Still need to add docs and index.d.ts entry).
- Deprecate `Span#setDestinationContext()` and add a new `_setDestinationContext()` for internal usage. The new one should no longer receive `.service.*` because that's handled automatically in Span#end(). Still keeping the old one, even though it was always internal because a user *could* have been using it, and I *could* have aided in that usage (https://gist.github.com/trentm/8c8fcecbec1a99ce0fbb415ef87ae2db) in answering a support question.
- Convert redis over to using the new `_setDestinationContext(), as the first guinea pig.
- *Also* update redis instrumentation values to match spec (https://github.com/elastic/apm/blob/main/specs/agents/tracing-instrumentation-db.md#redis):
    - span.type: "db", was "cache"
    - span.action: "query", was null
    - add span.context.db
@trentm trentm self-assigned this Aug 17, 2022
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Aug 17, 2022
@apmmachine
Copy link
Contributor

apmmachine commented Aug 18, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-09-14T18:47:42.993+0000

  • Duration: 28 min 32 sec

Test stats 🧪

Test Results
Failed 0
Passed 270348
Skipped 0
Total 270348

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run module tests for <modules> : Run TAV tests for one or more modules, where <modules> can be either a comma separated list of modules (e.g. memcached,redis) or the string literal ALL to test all modules

  • run benchmark tests : Run the benchmark test only.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@trentm
Copy link
Member Author

trentm commented Sep 9, 2022

run module tests for @elastic/elasticsearch

…se it for the 's3' special case of no 's3/' prefix
…method

Ideally node's warning system would offer (and even default) to showing
a given DeprecationWarning with a given code just *once* to not clutter
so much. Also... supporting user easily filtering out these warnings in
code. They are limited to using the 'warning' event *and* need to then
use `node --no-warnings`. We can always revert this if too noisy,
and make it a log.warn, say.
@trentm
Copy link
Member Author

trentm commented Sep 10, 2022

run module tests for ALL

@trentm
Copy link
Member Author

trentm commented Sep 11, 2022

run module tests for ALL

That didn't take that I could see. I've manually started https://apm-ci.elastic.co/job/apm-agent-nodejs/job/apm-agent-nodejs-mbp/job/PR-2882/30/ to do a full TAV run.

@trentm trentm changed the title feat: improve backend granularity for SQL-y databases feat: service.target.* to improve backend granularity Sep 12, 2022
@trentm trentm marked this pull request as ready for review September 12, 2022 18:36
@trentm trentm requested a review from astorm September 12, 2022 18:36
@@ -9,6 +9,9 @@
*/

module.exports = {
// The default span or transaction `type`.
DEFAULT_SPAN_TYPE: 'custom',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REVIEW NOTE: The spec says the default span.type should be 'custom'. We had been doing that in Span serialization. However, when inferring span.context.service.target.type in Span.end(), the algorithm needs that span.type value. One way was to have a shared constant -- which I've done here. Another way would be to actually set span.type = 'custom' as a default.

Copy link
Contributor

@astorm astorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty unwieldy and large. I don't think I can digest it for a full top-to-bottom review in a reasonable amount of time. I've taken a few passes and left some notes. The changes to a both a large number of implementation and a large number of tests at the same time make me a little uncertain of how to ensure we're not creating some small side effects that are going to cause behavior changes for users.

That said -- from a stability point of view the changes appear solid. The only place where I see potential for new null references are the changes to getDBDestination (where we can now return a null value) -- but our own use of this function is to set a span's _destination value, and we appear to have appropriate guard clauses on _destination to ensure no null references. I don't think these changes will cause any new crashes and should be OK to release as is, so I'm approving.

Comment on lines +244 to +246
Span.prototype._setDestinationContext = function (destCtx) {
this._destination = Object.assign(this._destination || {}, destCtx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a longer term discussion that goes beyond this PR, but we don't seem to be super consistent on the semantic meaning of method names with a leading underscore.

This usually means the method is meant to be analogous to a protected or private method in a language with access modifiers.

However, we're calling _setDestinationContext as a public method in the instrumentation modules so in this instance we seem to be saying that _setDestinationContext is a public method but just not a public method that's part of our API.

If we're going to adopt the leading underscore as a convention is would be good to get on the same page as to what it means and more generally how we think about enforcing API boundaries between our own modules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I responded a little bit below (#2882 (comment)) as well.

Mostly this use of leading underscore to mean "internal to the agent" seemed inline with current common uses of the following in the instrumentation modules:

agent._instrumentation
agent._conf
span._setOutcomeFromHttpStatusCode
span._setOutcomeFromErrorCapture

However, documenting naming conventions would be good.
I'll bring up a separate discussion on conventions for this.

}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping 70+ lines into an already busy method seems like -- a lot. This new logic might be better off in its own method or private module helper function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I'll move those out to a separate function or method.

* Internal APM agent code should use `_setDestinationContext()`.
*/
Span.prototype.setDestinationContext = function (destCtx) {
process.emitWarning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how necessary this warning is since setDestinationContext was never documented as a part of our public API. This is another example of us not being clear about what is and isn't a public API.

By emitting this warning we seem to be saying this was a part of our public API.

However, by not documenting it we were saying it's not part of our public API.

By not having a CHANGELOG entry for this change we seem to be saying it was never a part of our public API.

It we want to start emitting these sorts of warnings when we deprecate something I don't object -- I'm just ring the "if we're going to be this fiddly we should come up with some rules" rather than making adhoc decisions about how we're doing things on a PR by PR basis.

Copy link
Member Author

@trentm trentm Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how necessary this warning is

My goal in adding this warning is to attempt to help the accidental user of this never-public method to (a) notice and (b) have time to migrate to the new, public .setServiceTarget().

While these <span>.set<Something>Context() methods have never been part of our public API, they sure look appealing for those attempting to do manual instrumentation:

  • Those users might not pay attention to the TypeScript-y index.d.ts,
  • those users might not rely solely on the API docs for what APIs they can use,
  • there aren't any public APIs to use to set some of these context attributes on spans to access features like the "Service Map" and "Dependencies" in APM UI, and
  • that these methods are not prefixed with an underscore hints that perhaps they are fair game to use.

On that last point, perhaps I am showing my Python heritage in using a leading underscore as a hint that an attribute is non-public. Though that leading-underscore practice is used frequently in Node.js-core code as well.

Also, I feel some duty to the customer to whom I offered this code to use for manual instrumentation of Oracle (which we don't support yet): https://gist.github.com/trentm/8c8fcecbec1a99ce0fbb415ef87ae2db
That code uses .setDbContext() and .setDestinationContext().

By not having a CHANGELOG entry for this change we seem to be saying it was never a part of our public API.

I could certainly add a message to the changelog about this. I'll do that. I had been intending to add the following to the commit message (which is already in the PR description) -- but in the changelog is good too:

The not-public-but-available-because-JavaScript `span.setDestinationContext` has been deprecated (using it will `process.emitWarning()`) and replaced with an internal `span._setDestinationContext()`.

By emitting this warning we seem to be saying this was a part of our public API.

I'm changing the warning message to the following to attempt to alleviate that concern slightly:

'<span>.setDestinationContext() was never a public API and will be removed, use <span>.setServiceTarget().',

"if we're going to be this fiddly we should come up with some rules" rather than making adhoc decisions

Sounds good. I'll bring up a discussion separately. Perhaps it can lead to developer guide content in DEVELOPMENT.md

@trentm trentm merged commit 6826d9b into main Sep 14, 2022
@trentm trentm deleted the trentm/backend-granularity-sql branch September 14, 2022 20:59
PeterEinberger pushed a commit to fpm-git/apm-agent-nodejs that referenced this pull request Aug 20, 2024
This adds the new `span.context.service.target.{type,name}` fields for
improved granularity on backend services data in exit spans. This data is
used for "Service Maps" and "Dependencies" in the Kibana APM app.

All instrumentations have been updated to set appropriate service target
values.  `service.target.*` is typically inferred automatically from other
span data, so much of the instrumentation work was in adding other span
fields, most commonly `span.context.db.instance`. The one current exception
to the general inference algorithm is S3 spans, for which the spec'd
`service.target.name` doesn't follow the general pattern.

A new, public `span.setServiceTarget(type, name)` API has been added (this is
a "SHOULD" in the spec).

The new fields will replace the (now deprecated)
`span.context.destination.service.*` fields. In the current stage of
transition:

- `destination.service.{type,name}` are set to the empty string. They are no
  longer used, but the intake API v2 schema up to 7.13 require them to be
  set.
- Setting `destination.service.resource` directly is discouraged. Typically
  it is inferred from `service.target.*` values when a span is ended. Again
  the one exception is S3 spans.

The not-public-but-available-because-JavaScript `span.setDestinationContext`
has been deprecated (using it will `process.emitWarning()`) and replaced with
an internal `span._setDestinationContext()`.

As part of this change, improvements have been made to some module
instrumentations:

- `redis` and `ioredis`: `span.type` has changed from "cache" to "db" per
  spec (https://github.com/elastic/apm/blob/main/specs/agents/tracing-instrumentation-db.md#redis)
- `mongodb`: `span.action` used to be "query", now it will be the mongodb
  command name, e.g. "find", "insert".
- `mongodb` and `mongodb-core`: `span.db.instance` is now set to the database
  name (elastic#1494)
- `mysql` and `mysql2`: `span.db.{instance,user}` are now populated.
- `@elastic/elasticsearch`: The cluster name is heuristically determined for
  Elastic Cloud deployments and used for `db.instance`.
- `sqs`: `span.destination.{address,port}` are now populated.
- `pg`: `span.db.{instance,user}` are now populated.
- `cassandra-driver`: the Cassandra keyspace is captured for service target
  data, if available.
- OpenTelemetry Bridge: OTel spans with kind PRODUCER and CLIENT are now
  handled as exit spans (e.g. span compression could apply).

Closes: elastic#2621
Closes: elastic#2822
Closes: elastic#1494
Closes: elastic#1897
Closes: elastic#2103
Obsoletes: elastic#2458
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
3 participants