Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tracing): actually set the tracing options in the root context #5118

Merged
merged 12 commits into from
Jul 5, 2023
32,046 changes: 5,827 additions & 26,219 deletions package-lock.json

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions packages/build/docs/tracing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Tracing in Netlify Build

Netlify Build relies on Open Telemetry tracing to emit trace data:

- https://opentelemetry.io/docs/instrumentation/js/

In production, trace data is exported to [Honeycomb](https://ui.honeycomb.io). Buildbot is responsible for passing over
trace information which allows build executions to be stitched together into a single trace across Buildbot and
`@netlify/build`. The initialisation for this tracing SDK is done
[here](https://github.com/netlify/build/blob/main/packages/build/src/tracing/main.ts). We also use an open telemetry
collector in production.

## Adding more instrumentation

More data can be added by either generating more spans or adding more attributes to relevant stages. Check the Open
Telemetry docs for manual instrumentation:

- https://opentelemetry.io/docs/instrumentation/js/manual/

We also have some utility methods you can leverage to do this:

- https://github.com/netlify/build/blob/main/packages/build/src/tracing/main.ts

## Exporting data locally

You can export trace data when running `@netlify/build` locally, to do so you just need to leverage the `tracing`
[flag properties](https://github.com/netlify/build/blob/main/packages/build/src/core/flags.js#L194) to point to
Honeycomb directly. For example:

```
node packages/build/bin.js --debug --tracing.enabled=true --tracing.apiKey=<honeycomb-tracing-api-key> --tracing.httpProtocol=https --tracing.host=api.honeycomb.io --tracing.port=443 ../my-site
```

The tracing API Key should be an Honeycomb environment API key. If testing things locally you can use the `dev`
environment.
4 changes: 1 addition & 3 deletions packages/build/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
"license": "MIT",
"dependencies": {
"@bugsnag/js": "^7.0.0",
"@honeycombio/opentelemetry-node": "^0.4.0",
"@netlify/cache-utils": "^5.1.5",
"@netlify/config": "^20.5.1",
"@netlify/edge-bundler": "8.16.2",
Expand All @@ -74,9 +75,6 @@
"@netlify/run-utils": "^5.1.1",
"@netlify/zip-it-and-ship-it": "9.11.0",
"@opentelemetry/api": "^1.4.1",
"@opentelemetry/exporter-trace-otlp-grpc": "^0.40.0",
"@opentelemetry/instrumentation-http": "^0.40.0",
"@opentelemetry/sdk-node": "^0.40.0",
"@sindresorhus/slugify": "^2.0.0",
"ansi-escapes": "^6.0.0",
"chalk": "^5.0.0",
Expand Down
4 changes: 2 additions & 2 deletions packages/build/src/core/build.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@ export const startBuild = function (flags: Partial<BuildFlags>) {

const { bugsnagKey, tracingOpts, debug, systemLogFile, ...flagsA } = normalizeFlags(flags, logs)
const errorMonitor = startErrorMonitor({ flags: { tracingOpts, debug, systemLogFile, ...flagsA }, logs, bugsnagKey })
startTracing(tracingOpts, getSystemLogger(logs, debug, systemLogFile))
const rootTracingContext = startTracing(tracingOpts, getSystemLogger(logs, debug, systemLogFile))
Copy link
Contributor

@4xposed 4xposed Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be: rootTrace? is "Context" a thing in the JS/TS world?
scratch that, I just saw the context bit on the other file

Copy link
Contributor Author

@JGAntunes JGAntunes Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context in this case is specifc to the open-telemetry JS implementation. In this case we're returning a context from:

  • // Sets the current trace ID and span ID based on the options received
    // this is used as a way to propagate trace context from Buildbot
    return trace.setSpanContext(context.active(), {
    traceId: options.traceId,
    spanId: options.parentSpanId,
    traceFlags: options.traceFlags,
    isRemote: true,
    })

See:

Also:

This context will hold the specific traceId, spanId, etc. That we pass over from Buildbot.


return { ...flagsA, debug, systemLogFile, errorMonitor, logs, timers }
return { ...flagsA, rootTracingContext, debug, systemLogFile, errorMonitor, logs, timers }
}

const tExecBuild = async function ({
Expand Down
10 changes: 10 additions & 0 deletions packages/build/src/core/flags.js
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,16 @@ Default: false`,
describe: 'Enable distributed tracing for build',
hidden: true,
},
'tracing.apiKey': {
string: true,
describe: 'API Key for the tracing backend provider',
hidden: true,
},
'tracing.httpProtocol': {
string: true,
describe: 'Traces backend protocol. HTTP or HTTPS.',
hidden: true,
},
'tracing.host': {
string: true,
describe: 'Traces backend host',
Expand Down
6 changes: 4 additions & 2 deletions packages/build/src/core/main.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { trace } from '@opentelemetry/api'
import { trace, context } from '@opentelemetry/api'

import { handleBuildError } from '../error/handle.js'
import { reportError } from '../error/report.js'
Expand Down Expand Up @@ -44,6 +44,7 @@ export default async function buildSite(flags: Partial<BuildFlags> = {}): Promis
telemetry,
buildId,
deployId,
rootTracingContext,
...flagsA
}: any = startBuild(flags)
const errorParams = { errorMonitor, mode, logs, debug, testOpts }
Expand All @@ -55,7 +56,8 @@ export default async function buildSite(flags: Partial<BuildFlags> = {}): Promis
'deploy.context': flagsA.context,
'site.id': flagsA.siteId,
}
const rootCtx = setMultiSpanAttributes(attributes)
const rootCtx = context.with(rootTracingContext, () => setMultiSpanAttributes(attributes))

return await tracer.startActiveSpan('exec-build', {}, rootCtx, async (span) => {
try {
const {
Expand Down
11 changes: 10 additions & 1 deletion packages/build/src/core/normalize_flags.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ const DEFAULT_EDGE_FUNCTIONS_DIST = '.netlify/edge-functions-dist/'
const DEFAULT_FUNCTIONS_DIST = '.netlify/functions/'
const DEFAULT_CACHE_DIR = '.netlify/cache/'
const DEFAULT_STATSD_PORT = 8125

const DEFAULT_OTEL_TRACING_PORT = 4317
const DEFAULT_OTEL_ENDPOINT_PROTOCOL = 'http'

export type ResolvedFlags = {
env: Record<string, unknown>
Expand Down Expand Up @@ -95,7 +97,14 @@ const getDefaultFlags = function ({ env: envOpt = {} }, combinedEnv) {
testOpts: {},
featureFlags: DEFAULT_FEATURE_FLAGS,
statsd: { port: DEFAULT_STATSD_PORT },
tracing: { enabled: false, port: DEFAULT_OTEL_TRACING_PORT },
// tracing.apiKey defaults to '-' else we'll get warning logs if not using
// honeycomb directly - https://github.com/honeycombio/honeycomb-opentelemetry-node/issues/201
Comment on lines +100 to +101
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honeycomb emits a set of console.warn logs we can't skip if we don't provide an apiKey - honeycombio/honeycomb-opentelemetry-node#201. This is why for production we pass -, since we're going to use a collector.

tracing: {
enabled: false,
apiKey: '-',
httpProtocol: DEFAULT_OTEL_ENDPOINT_PROTOCOL,
port: DEFAULT_OTEL_TRACING_PORT,
},
timeline: 'build',
quiet: false,
}
Expand Down
3 changes: 3 additions & 0 deletions packages/build/src/core/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,11 @@ export type ErrorParam = {

export type TracingOptions = {
enabled: boolean
httpProtocol: string
host: string
port: number
/** API Key used for a dedicated trace provider */
apiKey: string
/** Properties of the root span and trace id used to stitch context */
traceId: string
traceFlags: number
Expand Down
5 changes: 3 additions & 2 deletions packages/build/src/steps/run_step.ts
Original file line number Diff line number Diff line change
Expand Up @@ -67,16 +67,17 @@ export const runStep = async function ({
// Add relevant attributes to the upcoming span context
const attributes: StepExecutionAttributes = {
'build.execution.step.name': coreStepName,
'build.execution.step.description': coreStepDescription,
'build.execution.step.package_name': packageName,
'build.execution.step.id': coreStepId,
'build.execution.step.loaded_from': loadedFrom,
'build.execution.step.origin': origin,
'build.execution.step.event': event,
}
const spanCtx = setMultiSpanAttributes(attributes)
// If there's no `coreStepId` then this is a plugin execution
const spanName = `run-step-${coreStepId || 'plugin'}`
Comment on lines +77 to +78
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coreStepId only exists for internal steps. We replace this for plugin for plugin executions and we can use the package_name to aggregate on the package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind giving more information about the plugin vs steps?

For example in this comment, it mentions we have core steps, build commands or plugin. I understand the meaning of the steps we do in buildbot, but not in here. Maybe this is a question too broad, so we can catch up on "Making Moves"


return tracer.startActiveSpan(`run-step-${coreStepId}`, {}, spanCtx, async (span) => {
return tracer.startActiveSpan(spanName, {}, spanCtx, async (span) => {
const constantsA = await addMutableConstants({ constants, buildDir, netlifyConfig })

const shouldRun = await shouldRunStep({
Expand Down
23 changes: 11 additions & 12 deletions packages/build/src/tracing/main.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import { HoneycombSDK } from '@honeycombio/opentelemetry-node'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HoneycombSDK provides a couple of things out the box which make the initialisation simpler.

import { context, trace, propagation, SpanStatusCode, diag, DiagLogLevel, DiagLogger } from '@opentelemetry/api'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc'
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't manage to setup auto instrumentation. Seems like it's now working for esm but it's fairly recent and requires us to pass additional node flags:

I've been thinking a bit about this and I think in the future we might want to have this tracing initialisation happen in a separate module we can use via node --require @netlify/tracing-initialisation ./call-neltify/build (or node --import once we move to node 20 I believe?) since it would:

import { NodeSDK } from '@opentelemetry/sdk-node'

import type { TracingOptions } from '../core/types.js'
Expand All @@ -10,7 +9,11 @@ let sdk: NodeSDK

/** Given a simple logging function return a `DiagLogger`. Used to setup our system logger as the diag logger.*/
const getOtelLogger = function (logger: (...args: any[]) => void): DiagLogger {
const otelLogger = (...args: any[]) => logger('[otel-traces]', ...args)
const otelLogger = (...args: any[]) => {
// Debug log msgs can be an array of 1 or 2 elements with the second element being an array fo multiple elements
const msgs = args.flat(1)
logger('[otel-traces]', ...msgs)
}
return {
debug: otelLogger,
info: otelLogger,
Expand All @@ -25,14 +28,11 @@ export const startTracing = function (options: TracingOptions, logger: (...args:
if (!options.enabled) return
if (sdk) return

const traceExporter = new OTLPTraceExporter({
url: `http://${options.host}:${options.port}`,
})

sdk = new NodeSDK({
sdk = new HoneycombSDK({
serviceName: ROOT_PACKAGE_JSON.name,
traceExporter,
instrumentations: [new HttpInstrumentation()],
protocol: 'grpc',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both in prod and locally we'll always use grpc

apiKey: options.apiKey,
endpoint: `${options.httpProtocol}://${options.host}:${options.port}`,
})

// Set the diagnostics logger to our system logger. We also need to suppress the override msg
Expand All @@ -43,7 +43,7 @@ export const startTracing = function (options: TracingOptions, logger: (...args:

// Sets the current trace ID and span ID based on the options received
// this is used as a way to propagate trace context from Buildbot
trace.setSpanContext(context.active(), {
return trace.setSpanContext(context.active(), {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual change that ensures the traceId and spanId are using when initialising the first span.

traceId: options.traceId,
spanId: options.parentSpanId,
traceFlags: options.traceFlags,
Expand Down Expand Up @@ -96,7 +96,6 @@ export type RootExecutionAttributes = {
/** Attributes used for the execution of each build step */
export type StepExecutionAttributes = {
'build.execution.step.name': string
'build.execution.step.description': string
'build.execution.step.package_name': string
'build.execution.step.id': string
'build.execution.step.loaded_from': string
Expand Down