Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(instrumentation-pino): add log sending to Logs Bridge API #2249

Merged
merged 26 commits into from
Jun 27, 2024

Conversation

trentm
Copy link
Contributor

@trentm trentm commented May 30, 2024

This adds "log sending" -- sending log records to the Logs SDK -- for the pino logger.
(The same was done for Bunyan in #1713 and Winston in #1837.)

  • When a Pino logger is created, this wraps the logger instance's output stream with a pino.multistream that sends to the original and to the Logs SDK (class OTelPinoStream).
  • This adds disableLogSending and disableLogCorrelation options, the same as in instr-bunyan and instr-winston

Limitations:

  • This does not yet add support for the logSeverity option. I prefer to do that in a separate change, because it might be involved with Pino loggers configured with custom levels.
  • This does not yet test log sending in the ESM tests. I'll add that separately, because it requires some updates to the contrib-test-utils package.

reviewer notes

The test file changes on this PR assume that #2247 gets merged first. Please review that PR first. After that is merged, then the test file changes here will be additive and more clear.
Update (2024-06-10): This is done now, so this PR is now ready for review.

checklist

trentm added 13 commits May 27, 2024 10:21
…hange behaviour for that logger

This removes functionality that was there before, so technically could be breaking.
The motivation is to have the contract of pino instrumentation be cleaner.
With this change a pino Logger instance will not be touched if
the PinoInstrumentation is disabled. I.e. we are hands-off as much as
possible when disabled.

Before this change, even when disabled, the instrumentation would tweak
the pino Logger config to have a no-op mixin. If the instrumentation was
later enabled, then the mixin would become active (adding trace_id et al
fields in a span context).

The coming "log sending" to the Logs Bridge API will *not* support
this "work if instrumentation is re-enabled later", so I think it is
clearer if neither "log sending" nor "log correlation" support this.

We can back this out if we think it is important to support a possible
future feature of the SDK doing live enabling/disabling of
individual instrumentations.
…og-correlation *will* follow the live instr enable/disable state
Copy link

codecov bot commented May 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.42%. Comparing base (dfb2dff) to head (1bb7260).
Report is 176 commits behind head on main.

Current head 1bb7260 differs from pull request most recent head 632e5c9

Please upload reports for the commit 632e5c9 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2249      +/-   ##
==========================================
- Coverage   90.97%   90.42%   -0.56%     
==========================================
  Files         146      148       +2     
  Lines        7492     7322     -170     
  Branches     1502     1518      +16     
==========================================
- Hits         6816     6621     -195     
- Misses        676      701      +25     
Files Coverage Δ
...emetry-instrumentation-pino/src/instrumentation.ts 100.00% <100.00%> (+4.00%) ⬆️
...etry-instrumentation-pino/src/log-sending-utils.ts 100.00% <100.00%> (ø)

... and 56 files with indirect coverage changes

trentm added 4 commits May 30, 2024 15:47
…the first half-second since process start

Effectively the issue is that this sometimes returns true:
node -e 'console.log(Math.round(Date.now() / 1e3) * 1e3 < performance.timeOrigin)'
@trentm trentm marked this pull request as ready for review May 31, 2024 00:12
@trentm trentm requested a review from a team May 31, 2024 00:12
import { logs, Logger, SeverityNumber } from '@opentelemetry/api-logs';
import { PACKAGE_NAME, PACKAGE_VERSION } from './version';
import { millisToHrTime } from '@opentelemetry/core';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: From here down to function severityNumberFromPinoLevel is basically identical to the equivalent level/severity mapping in instrumentation-bunyan, because Bunyan and Pino share the same level names and values.


## Semantic Conventions

This package does not currently generate any attributes from semantic conventions.

## Alternative log sending with `pino-opentelemetry-transport`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the instrumentation would be capable to know if pino-opentelemetry-transport is used when creating a new logger and disable sending in the main thread. Or maybe warn the user if this transport is detected and suggest to change its disableSending configuration if applies.

Copy link
Contributor Author

@trentm trentm Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I don't believe there is a way for instrumentation-pino to know if pino-opentelemetry-transport is being used. Here is an example script:

const otel = require('@opentelemetry/api');
const pino = require('pino');

const transport = pino.transport({ target: 'pino-opentelemetry-transport' });
const logger = pino({level: 'debug'}, transport);
console.log('XXX The logger stream is:', logger[pino.symbols.streamSym]);

logger.debug({foo: 'bar'}, 'hi');

const tracer = otel.trace.getTracer('example');
tracer.startActiveSpan('manual-span', span => {
  logger.info('in a span');
  span.end();
});

If instrumentation-pino is active, the only thing it has to look at is that logger[pino.symbols.streamSym] value -- the destination stream of the Pino Logger. This is the thing that the instrumentation calls origStream and wraps in a multistream.

That thing is a ThreadStream (from https://github.com/pinojs/thread-stream/blob/main/index.js):

XXX The logger stream is: <ref *1> ThreadStream {
  _events: [Object: null prototype] {
    message: [Function (anonymous)],
    ready: [Function: onReady],
    close: [Function (anonymous)]
  },
  _eventsCount: 3,
  _maxListeners: undefined,
  worker: Worker {
    _events: [Object: null prototype] {
      newListener: [Function (anonymous)],
      removeListener: [Function (anonymous)],
      message: [Function: onWorkerMessage],
      exit: [Function: onWorkerExit]
    },
    _eventsCount: 4,
    _maxListeners: undefined,
    performance: { eventLoopUtilization: [Function: bound eventLoopUtilization] },
    stream: FakeWeakRef { _value: [Circular *1] },
    [Symbol(kCapture)]: false,
...

That ThreadStream provides no way to know what the target package or Node.js file is being run in the Worker thread. In other words, instr-pino can know that a Pino transport is being used (i.e. something running in a separate thread), but has no idea whether that is pino-opentelemetry-transport.

I think it would be overkill to look into instrumenting pino.transport() to tag any returned transport that is using pino-opentelemetry-transport to try to tell.


Second, I'm not sure that we'd want instr-pino to automatically do anything even if it did know.

  • Automatically disabling log-sending if pino-opentelemetry-transport (POT) was detected would potentially be surprising behaviour. I think to be explicit we'd then want disableLogSending to be a tri-state config: true, false, "true-if-POT-detected".
  • Automatically warning if POT was detected? I suppose this would be fair. Most users would never want both active on a given Pino logger. (Eventually perhaps we'd want the ability for users to silence particular warnings from the OTel SDK that they don't care above -- something like Python's warnings filter https://docs.python.org/3/library/warnings.html#the-warnings-filter)

These are why I opted to just have documentation about this so that users are at least made aware of the options (assuming they get as far as reading this README). Perhaps a future "Troubleshooting" guide would merit a section on "I'm using Pino and getting double logs. Why?!".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are valid points. Thanks for taking the time to give your reasoning :)

@trentm
Copy link
Contributor Author

trentm commented Jun 19, 2024

@hectorhdzg Would you be interested in reviewing this as well? I'm not sure if Pino is as important to some of the work you are doing as Winston.

@hectorhdzg
Copy link
Member

@trentm changes look good, my only question is related to pino-opentelemetry-transport usage , sorry if this was discussed somewhere else, why not using this package and patching the logger construction similar to Winston? is this related to having older pino version support?

trentm and others added 2 commits June 24, 2024 14:23
Co-authored-by: Hector Hernandez <39923391+hectorhdzg@users.noreply.github.com>
@trentm
Copy link
Contributor Author

trentm commented Jun 24, 2024

my only question is related to pino-opentelemetry-transport usage , sorry if this was discussed somewhere else, why not using this package and patching the logger construction similar to Winston? is this related to having older pino version support?

@hectorhdzg It hasn't been discussed elsewhere, so it is a good question.

The decision to not just suggest users use pino-opentelemetry-transport (POT) for log sending was not about having older pino version support. The log-sending support implemented here requires pino@7 or later. The current POT requires pino@8 or later, so I guess there is a difference in Pino versions supported. However, that wasn't a reason for my choice. (Pino 7 isn't being maintained anymore -- last release was in 2022 -- so there isn't much value is in support it with a new feature.)

why not using this package and patching the logger construction similar to Winston?

Please correct me if I miss something, but I see this as two separate questions.

instrumentation no longer patches the logger constructor args

First, why I moved away from patching the logger constructor to instead creating the Logger instance and then tweaking it.

I found that coping with all the possible input combinations to pino(...) and patching them to a form that does what we need was difficult. https://getpino.io/#/docs/api?id=destination describes a lot of possible inputs:

destination (Number | String | Object | DestinationStream | SonicBoomOpts | WritableStream)

This is the Pino internal code for normalizing the given pino() args: https://github.com/pinojs/pino/blob/a41b4d3c409b31529c8bcb4bf3cd4e016da8a8de/lib/tools.js#L290-L340
It has been a few weeks, but IIRC, because that buildSafeSonicBoom is internal I'm not sure that it is possible to accurately patch the input arguments and never change the semantics. As a minor example, the current instrumentation-pino throws on pino(2) -- which granted is an invalid pino() argument, but the instrumentation shouldn't break usage:

% cat sadpino.js
const pino = require('pino')
const log = pino(2)
log.info('hi')

% node sadpino.js
{"level":30,"time":1719270084687,"pid":83775,"hostname":"peach.local","msg":"hi"}

% node -r @opentelemetry/auto-instrumentations-node/register sadpino.js
OTEL_TRACES_EXPORTER is empty. Using default otlp exporter.
OpenTelemetry automatic instrumentation started successfully
/Users/trentm/tmp/asdf.20240624T155854/node_modules/@opentelemetry/instrumentation-pino/build/src/instrumentation.js:122
            options.mixin = this._getMixinFunction();
                          ^

TypeError: Cannot create property 'mixin' on number '2'
    at PinoInstrumentation._combineOptions (/Users/trentm/tmp/asdf.20240624T155854/node_modules/@opentelemetry/instrumentation-pino/build/src/instrumentation.js:122:27)
    at /Users/trentm/tmp/asdf.20240624T155854/node_modules/@opentelemetry/instrumentation-pino/build/src/instrumentation.js:55:47
    at Object.<anonymous> (/Users/trentm/tmp/asdf.20240624T155854/sadpino.js:2:13)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47

Node.js v18.18.2

After the pino logger is created, it has a well-defined shape that can be patched cleanly.

why not just use 'pino-opentelemetry-transport'?

This could either mean that we get instrumentation-pino to automatically use
'pino-opentelemetry-transport' (POT), or just document to users that they should
use POT for log-sending with Pino.

On the "for using pino-opentelemetry-transport" side:

  • It is maintained by the Pino community. In addition to being associated with Pino, it also would theoretically mean less maintenance for OTel JS. Both are positives.
  • The design of using a worker thread means that in most situations (environments that aren't constrained to a single CPU), there will be less JSON.parse and OTLP serialization CPU work on the main thread.

My arguments for implementing it without POT are:

  • Architecturally, POT differs a lot from our other instrumentations. It runs code in a separate worker-thread. AFAIK no other OTel instrumentations do that. In depends on (https://gist.github.com/trentm/95c1821bf84335eac0ba8c9a5dbf948d) and starts its own OTel SDK components. I think the dependencies alone rule it out as something that we could include by default in auto-instrumentations-node.
  • Also on the architectural difference: There is a lot more complexity with POT that I worry would lead to more support effort. It starts its own OTel SDK components in the separate thread. The current usage of node -r @opentelemetry/auto-instrumentations-node/register ... will also start an SDK in any worker threads, because -r ... preload modules are passed to worker threads unless the code starting the worker does something special. Granted this is an issue orthogonal to Pino transport usage, that we should discuss separately. (E.g. Should the auto-instrumentations-node/register entry point have a guard to only run in the main thread?)
  • A lesser complexity: For POT to include traceId/spanId in LogRecords, it relies on the trace_id / span_id fields to be on the log record. The implementation in this PR does not. The fairly-recently added logKeys option to instrumentation-pino allows users to change the log fields to myTraceId or whatever. This will break having span context fields in LogRecords in POT.
  • This is perhaps more my personal opinion: I'm not convinced that most users of a logger need the more complex design of a sending on a separate thread to free up CPU cycles on the main thread.
  • Assuming that we cannot include POT in instrumentation-pino by default, then the current implementation (not using POT) offers the benefit of being something that can be enabled by default -- for "zero-code" instrumentation. Granted we could get instrumentation-pino to "do log-sending only if require('POT') works" -- but I think that is subtle behaviour that is best avoided if possible.

Downsides of adding log-sending support to instrumentation-pino:

  • It adds some potential confusion for users wanting to use Pino and OTel. They may need to now understand both options and choose between them, or ensure they work together.

Sorry, that got long.

Copy link
Member

@hectorhdzg hectorhdzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @trentm I appreciate your detailed response on my latest question.

@pichlermarc pichlermarc merged commit 055ef41 into open-telemetry:main Jun 27, 2024
17 checks passed
@dyladan dyladan mentioned this pull request Jun 27, 2024
@trentm trentm deleted the tm-pino-logs-bridge branch June 28, 2024 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants