OpenTelemetry HTTP Instrumentation Causing Performance Impact (50% Slowdown) #5525
-
Hi All, I am enabling OpenTelemetry in my app to improve observability. However, after enabling it, I noticed a significant performance impact, with some endpoints slowing down by 50%. After investigating, I found that the HTTP instrumentation (@opentelemetry/instrumentation-http) is the main cause of the slowdown. I tried different configurations to reduce the impact, including: However, even excluding the impacted endpoints did not help—the slowdown remains. Has anyone faced similar performance issues? Any ideas on optimizing OpenTelemetry HTTP instrumentation to reduce the overhead? Looking forward to your insights!
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
@qaidMohammed thanks for reaching out. These performance impacts have always been very difficult to quantify in the past since most examples provided were really, really light server applications that basically did little-to-no work in the handling of a request. In such apps the overhead is quite exaggerated as the bulk of the work that's done while handling a request is actually telemetry generation, examples and context management. And that is constant overhead per request - no matter if it usually takes 10ms or 500ms, impact on quick handlers will be more noticeable than on slower ones when measured in percent. That does not go to say that there's nothing we can do to improve OTel JS' performance in general - there are indeed a lot of places in need of optimization, and if there's anything sticking out that can be improved, we're always happy to see PRs in that regard. Please keep in mind though that PRs for performance improvements should always be accompanied by some form of being able to measure the impact, which is where many such PRs fall short. |
Beta Was this translation helpful? Give feedback.
-
Hi @pichlermarc, |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
@qaidMohammed thanks for reaching out.
These performance impacts have always been very difficult to quantify in the past since most examples provided were really, really light server applications that basically did little-to-no work in the handling of a request. In such apps the overhead is quite exaggerated as the bulk of the work that's done while handling a request is actually telemetry generation, examples and context management. And that is constant overhead per request - no matter if it usually takes 10ms or 500ms, impact on quick handlers will be more noticeable than on slower ones when measured in percent.
That does not go to say that there's nothing we can do to improve OTel JS' …