OpenTelemetry HTTP Instrumentation Causing Performance Impact (50% Slowdown) #5525

qaidMohammed · 2025-03-06T09:31:27Z

qaidMohammed
Mar 6, 2025

Hi All,

I am enabling OpenTelemetry in my app to improve observability. However, after enabling it, I noticed a significant performance impact, with some endpoints slowing down by 50%.

After investigating, I found that the HTTP instrumentation (@opentelemetry/instrumentation-http) is the main cause of the slowdown.

I tried different configurations to reduce the impact, including:
[ ] Lowering the sampling rate
[ ] Using ignoreIncomingRequestHook to skip tracing for affected endpoints
[ ] Ignoring outgoing requests
[ ] the package is upgraded to the latest version

However, even excluding the impacted endpoints did not help—the slowdown remains.

Has anyone faced similar performance issues? Any ideas on optimizing OpenTelemetry HTTP instrumentation to reduce the overhead?

Looking forward to your insights!

const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { BatchSpanProcessor, NoopSpanProcessor, TraceIdRatioBasedSampler, ParentBasedSampler } = require('@opentelemetry/sdk-trace-base');
const { ConsoleSpanExporter } = require('@opentelemetry/sdk-trace-base');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');


// Use sampling to reduce the number of spans
const provider = new NodeTracerProvider({
  sampler: new ParentBasedSampler({
    root: new TraceIdRatioBasedSampler(0.01), 
  }),
  spanProcessors: [new NoopSpanProcessor()],
});

provider.register();

registerInstrumentations({
    instrumentations: [new HttpInstrumentation()],
});

Answered by pichlermarc

Mar 6, 2025

@qaidMohammed thanks for reaching out.

These performance impacts have always been very difficult to quantify in the past since most examples provided were really, really light server applications that basically did little-to-no work in the handling of a request. In such apps the overhead is quite exaggerated as the bulk of the work that's done while handling a request is actually telemetry generation, examples and context management. And that is constant overhead per request - no matter if it usually takes 10ms or 500ms, impact on quick handlers will be more noticeable than on slower ones when measured in percent.

That does not go to say that there's nothing we can do to improve OTel JS' …

View full answer

pichlermarc · 2025-03-06T09:57:47Z

pichlermarc
Mar 6, 2025
Maintainer

@qaidMohammed thanks for reaching out.

These performance impacts have always been very difficult to quantify in the past since most examples provided were really, really light server applications that basically did little-to-no work in the handling of a request. In such apps the overhead is quite exaggerated as the bulk of the work that's done while handling a request is actually telemetry generation, examples and context management. And that is constant overhead per request - no matter if it usually takes 10ms or 500ms, impact on quick handlers will be more noticeable than on slower ones when measured in percent.

That does not go to say that there's nothing we can do to improve OTel JS' performance in general - there are indeed a lot of places in need of optimization, and if there's anything sticking out that can be improved, we're always happy to see PRs in that regard. Please keep in mind though that PRs for performance improvements should always be accompanied by some form of being able to measure the impact, which is where many such PRs fall short.

2 replies

qaidMohammed Mar 6, 2025
Author

Hi @pichlermarc,

Thanks for the detailed explanation. It makes sense that the overhead on fast endpoints could be more noticeable due to the telemetry processing. Since I’ve already tried using ignoreIncomingRequestHook to exclude certain paths, but it didn’t seem to help reduce the performance impact, I was wondering if there are any other ways to exclude specific parts of the code or control what gets traced?

Is there a more granular way to manage which requests or operations are traced in OpenTelemetry, aside from using ignoreIncomingRequestHook?

Looking forward to hearing your thoughts on this!

pichlermarc Mar 6, 2025
Maintainer

Looking at the code, the hook should avoid a lot of the work. It's entirely possible that context management does introduce a lot of overhead already.

Two questions for narrowing it down:

what's the overhead that you see without @opentelemetry/instrumentation-http active?
- this should give you the net impact of context management
- you can check by skipping this secion in the above sample code

  registerInstrumentations({
   instrumentations: [new HttpInstrumentation()],
  });

what's the overhead that you see with @opentelemetry/instrumentation-http active, but no TracerProvider registered?
- this should give you the net impact of @opentelemetry/instrumentation-http, excluding the whole export pipeline and context management
- you can check by skipping this secion in the above sample code

provider.register();

qaidMohammed · 2025-03-06T16:56:21Z

qaidMohammed
Mar 6, 2025
Author

Hi @pichlermarc,
Thanks a lot
I do the same as you mentioned and I see that the issue occurs when the http-instrument is enabled.
BR,
Qaid

0 replies

qaidMohammed · 2025-03-07T09:11:07Z

qaidMohammed
Mar 7, 2025
Author

I have more information. I checked my CPU profile and noticed that opentelemetry using getNearestParentPackageJSON, Hook._require.Module.require are consuming significant resources on some endpoints.

Do you know why these functions are expensive? Is there a way to suppress or mitigate their impact? Additionally, I’d like to disable tracing for these specific endpoints.

4 replies

qaidMohammed Mar 11, 2025
Author

Hi @pichlermarc,

I noticed that we are making a lot of require() calls to load libraries during request handling. Normally, we rely on CommonJS caching to optimize this. However, I suspect that OpenTelemetry might be altering the behavior of require(), potentially affecting performance.

Does OpenTelemetry modify require() in a way that impacts module caching?
If so, is there a way to mitigate this while keeping OpenTelemetry instrumentation active?
Looking forward to your insights.

More Details:
CPU profiler, when OptenTelemetry enabled

Below is the snapshot for CPU profiler when OpenTelemetry off:

Best regards,
Qaid

pichlermarc Mar 11, 2025
Maintainer

We have to modify require behavior to patch what's returned to instrument the code. If there's a lot of loading during the request, then that's a edge-case I think we did not consider. Most of the code is written with the common in mind that most is loading is done on app startup so any expensive work that needs to be done happens there, rather than when the app is running.

I don't think there's a way to improve that that other than completely changing the way most instrumentation packages work today. I wonder if Node.js agents by vendors have less impact in these sorts of cases. Though I'd imagine they run into a similar limitation.

qaidMohammed Mar 11, 2025
Author

Hi @pichlermarc,
Thanks for the explanation! That makes a lot of sense, and I can see why our use case might be an edge case that wasn’t originally considered.

I also got to know from my colleagues that we "require" modules on every endpoint call and rely on the CommonJS cache to make that require fast. If OpenTelemetry's instrumentation is interfering with this caching mechanism, that could explain a significant part of the added latency we're seeing.

Given our setup, do you think it would be possible to extend the instrumentation and modify the patching behavior, similar to what’s described in the @opentelemetry/instrumentation package? Perhaps an approach where we selectively patch only certain modules or avoid patching dynamically required ones at runtime?

Would love to hear your thoughts on whether this could be a feasible workaround.

Best,
Qaid

qaidMohammed Mar 18, 2025
Author

Hi @pichlermarc,
We can close the issue as we have improved our code to avoid using require during runtime. This significantly reduced the performance overhead.

Thanks again for your support!
Best regards,
Qaid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTelemetry HTTP Instrumentation Causing Performance Impact (50% Slowdown) #5525

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

OpenTelemetry HTTP Instrumentation Causing Performance Impact (50% Slowdown) #5525

qaidMohammed Mar 6, 2025

Replies: 3 comments · 6 replies

pichlermarc Mar 6, 2025 Maintainer

qaidMohammed Mar 6, 2025 Author

pichlermarc Mar 6, 2025 Maintainer

qaidMohammed Mar 6, 2025 Author

qaidMohammed Mar 7, 2025 Author

qaidMohammed Mar 11, 2025 Author

pichlermarc Mar 11, 2025 Maintainer

qaidMohammed Mar 11, 2025 Author

qaidMohammed Mar 18, 2025 Author

qaidMohammed
Mar 6, 2025

Replies: 3 comments 6 replies

pichlermarc
Mar 6, 2025
Maintainer

qaidMohammed Mar 6, 2025
Author

pichlermarc Mar 6, 2025
Maintainer

qaidMohammed
Mar 6, 2025
Author

qaidMohammed
Mar 7, 2025
Author

qaidMohammed Mar 11, 2025
Author

pichlermarc Mar 11, 2025
Maintainer

qaidMohammed Mar 11, 2025
Author

qaidMohammed Mar 18, 2025
Author