You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can add metrics on the client side to gain the visibility like we have for the Jetty HTTP server.
Motivation
Recently, we experienced high latency issues and struggled to pinpoint the exact bottleneck. After a thorough analysis of the query lifecycle within Druid, we identified that one potential contributor to latency is the time taken by the HTTP client at the broker/router. During this process, threads can be blocked while waiting for a connection to become available.
Currently, we lack visibility into how long requests are waiting for a connection on the client side. By exposing a metric for the time requests spend waiting for a connection on the client side, we can identify whether the connection acquisition time is a significant bottleneck and allow us to take targeted actions to mitigate it.
Additionally, we have observed at the router that log lines indicate it hits the limit of 1024 at the Jetty HTTP client queue. This suggests that the queue size can become a limiting factor, potentially leading to increased latency or dropped requests. We suspect a similar situation could occur at the broker, where the NettyHTTP client might also experience high queue sizes. By adding a metric for the queue size or time spent by request at the broker, we can monitor this aspect and ensure that it does not become a hidden bottleneck.
In summary, adding these metrics on the client side will provide us with critical insights into the connection acquisition process and queue sizes, enabling us to more effectively identify and address latency bottlenecks within Druid.
The text was updated successfully, but these errors were encountered:
Description
We can add metrics on the client side to gain the visibility like we have for the Jetty HTTP server.
Motivation
Recently, we experienced high latency issues and struggled to pinpoint the exact bottleneck. After a thorough analysis of the query lifecycle within Druid, we identified that one potential contributor to latency is the time taken by the HTTP client at the broker/router. During this process, threads can be blocked while waiting for a connection to become available.
Currently, we lack visibility into how long requests are waiting for a connection on the client side. By exposing a metric for the time requests spend waiting for a connection on the client side, we can identify whether the connection acquisition time is a significant bottleneck and allow us to take targeted actions to mitigate it.
Additionally, we have observed at the router that log lines indicate it hits the limit of 1024 at the Jetty HTTP client queue. This suggests that the queue size can become a limiting factor, potentially leading to increased latency or dropped requests. We suspect a similar situation could occur at the broker, where the NettyHTTP client might also experience high queue sizes. By adding a metric for the queue size or time spent by request at the broker, we can monitor this aspect and ensure that it does not become a hidden bottleneck.
In summary, adding these metrics on the client side will provide us with critical insights into the connection acquisition process and queue sizes, enabling us to more effectively identify and address latency bottlenecks within Druid.
The text was updated successfully, but these errors were encountered: