Skip to content

Fix query response time metric#13661

Merged
mcvsubbu merged 3 commits intoapache:masterfrom
mcvsubbu:fix-req-arrival-time
Jul 24, 2024
Merged

Fix query response time metric#13661
mcvsubbu merged 3 commits intoapache:masterfrom
mcvsubbu:fix-req-arrival-time

Conversation

@mcvsubbu
Copy link
Contributor

@mcvsubbu mcvsubbu commented Jul 18, 2024

As of PR #13179, the query response time metrics started to get extremely high for all tables. The root cause tured out to be a missed initialization in which the query arrival time was set to 0 by default.

The refactor broke the contract that the broker request handler always returns the time taken to handle the request.

Added code to initialize the arrival time.

Since the refactor attempted to include client pre-processing time, added this code under the condition that the value is not already set.

Ideally, callers of the method should add on their own pre and post-processing time to the time returned by the handleRequest() method.

As of PR apache#13179, the query response time metrics started to get extremely high for
all tables. The root cause tured out to be a missed initialization in which
the query arrival time was set to 0 by default.

The refactor broke the contract that the broker request handler always returns the
time taken to handle the request.

Added code to initialize the arrival time.

Since the refactor attempted to include client pre-processing time, added this code
under the condition that the value is not already set.

Ideally, callers of the method should add on their own pre and post-processing time
to the time returned by the handleRequest() method.
@mcvsubbu mcvsubbu requested a review from Jackie-Jiang July 18, 2024 23:35
@codecov-commenter
Copy link

codecov-commenter commented Jul 19, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 61.96%. Comparing base (59551e4) to head (920e4f1).
Report is 796 commits behind head on master.

Files Patch % Lines
...r/requesthandler/BrokerRequestHandlerDelegate.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13661      +/-   ##
============================================
+ Coverage     61.75%   61.96%   +0.21%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2554     +118     
  Lines        133233   140567    +7334     
  Branches      20636    21869    +1233     
============================================
+ Hits          82274    87109    +4835     
- Misses        44911    46827    +1916     
- Partials       6048     6631     +583     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.90% <0.00%> (+0.19%) ⬆️
java-21 61.85% <0.00%> (+0.22%) ⬆️
skip-bytebuffers-false 61.93% <0.00%> (+0.18%) ⬆️
skip-bytebuffers-true 61.81% <0.00%> (+34.08%) ⬆️
temurin 61.96% <0.00%> (+0.21%) ⬆️
unittests 61.96% <0.00%> (+0.21%) ⬆️
unittests1 46.47% <ø> (-0.42%) ⬇️
unittests2 27.72% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we always set request arrival time when the RequestContext is just created (since it is a property within it).
Are you running some some custom code which is directly hooked to the BrokerRequestHandler.handleRequest()? Sorry I didn't expect this usage pattern during the refactor. I'd suggest following the same way to set it when the RequestContext is created for consistency.
If you want to make it more robust, we can set it in BaseBrokerRequestHandler.handleRequest() because we might not use delegate all the time

public BrokerResponse handleRequest(JsonNode request, @Nullable SqlNodeAndOptions sqlNodeAndOptions,
@Nullable RequesterIdentity requesterIdentity, RequestContext requestContext, @Nullable HttpHeaders httpHeaders)
throws Exception {
if (requestContext.getRequestArrivalTimeMillis() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more robust if we check for <= 0 because the return value is per implementation based

@mcvsubbu
Copy link
Contributor Author

Currently, we always set request arrival time when the RequestContext is just created (since it is a property within it). Are you running some some custom code which is directly hooked to the BrokerRequestHandler.handleRequest()? Sorry I didn't expect this usage pattern during the refactor. I'd suggest following the same way to set it when the RequestContext is created for consistency. If you want to make it more robust, we can set it in BaseBrokerRequestHandler.handleRequest() because we might not use delegate all the time

The BaseBrokerStarter class returns the delegate class from the getBrokerRequestHandler() handler. So, this class (and the handleRequest() methods within this class) are meant to be external interface to a broker. So, i am not sure what you mean when you say that we may not use delegate all the time. Can you explain a bit more?

Secondly, yes, installations may have some code before (and after return from) handleRequest(). Dont you think that the callers of handleRequest() should measure their own overhead if they are doing pre and post processing and add it on to the value returned by handleRequest() call for query time? It does not seem to be correct that the metric emitted includes only the pre-processing time but not the post-processing time. Agree?

IIRC used to be that request context was created inside handleRequest and returned. Somehow that interface has now changed to include requestContext as an argument, and I think that has caused the confusion.

Why is the "client" code directly calling BrokerRequestHandler? (isnt the client remote)?

I can change it to <= 0 if you like, but I also think the contract of this method needs a re-look?

@Jackie-Jiang
Copy link
Contributor

The BaseBrokerStarter class returns the delegate class from the getBrokerRequestHandler() handler. So, this class (and the handleRequest() methods within this class) are meant to be external interface to a broker. So, i am not sure what you mean when you say that we may not use delegate all the time. Can you explain a bit more?

You are correct. Currently we always create a delegate even without the multi-stage request handler, so it should be good to set it in the delegate.

Secondly, yes, installations may have some code before (and after return from) handleRequest(). Dont you think that the callers of handleRequest() should measure their own overhead if they are doing pre and post processing and add it on to the value returned by handleRequest() call for query time? It does not seem to be correct that the metric emitted includes only the pre-processing time but not the post-processing time. Agree?

Basically we want to include the query parsing time during the request handling. In PinotClientRequest class, we parse the query first to detect the sql type of the query, then pass the query to the request handler. IMO ideally we should set the requestArrivalTimeMs as the time when broker sees the query, which is outside of the broker request handler.
We can definitely check it again within the request handler to be more robust, and I'd suggest adding some comments because that code path won't be hit in the open source code.

@mcvsubbu
Copy link
Contributor Author

The BaseBrokerStarter class returns the delegate class from the getBrokerRequestHandler() handler. So, this class (and the handleRequest() methods within this class) are meant to be external interface to a broker. So, i am not sure what you mean when you say that we may not use delegate all the time. Can you explain a bit more?

You are correct. Currently we always create a delegate even without the multi-stage request handler, so it should be good to set it in the delegate.

Secondly, yes, installations may have some code before (and after return from) handleRequest(). Dont you think that the callers of handleRequest() should measure their own overhead if they are doing pre and post processing and add it on to the value returned by handleRequest() call for query time? It does not seem to be correct that the metric emitted includes only the pre-processing time but not the post-processing time. Agree?

Basically we want to include the query parsing time during the request handling. In PinotClientRequest class, we parse the query first to detect the sql type of the query, then pass the query to the request handler. IMO ideally we should set the requestArrivalTimeMs as the time when broker sees the query, which is outside of the broker request handler. We can definitely check it again within the request handler to be more robust, and I'd suggest adding some comments because that code path won't be hit in the open source code.

I suggest setting it (without any if check) inside the handleRequest() call, like it was before. If this is ok with you, I will remove the if around the set. The contract with the broker for external entities should be that the handleRequest() call provides the service, and emits metric that includes the time taken for the service.

Agreed?

@Jackie-Jiang
Copy link
Contributor

I suggest setting it (without any if check) inside the handleRequest() call, like it was before. If this is ok with you, I will remove the if around the set. The contract with the broker for external entities should be that the handleRequest() call provides the service, and emits metric that includes the time taken for the service.

No, I don't agree. We should capture the query parsing time as part of the request handling. You may take a look at PinotClientRequest.executeSqlQuery() (the rest resource to handle query) where we first parse the query, then send it to the request handler if it is DQL (there are also DML queries not handled by request handler). The request arrival time should be set to the time when the request arrives the broker.

@mcvsubbu
Copy link
Contributor Author

I suggest setting it (without any if check) inside the handleRequest() call, like it was before. If this is ok with you, I will remove the if around the set. The contract with the broker for external entities should be that the handleRequest() call provides the service, and emits metric that includes the time taken for the service.

No, I don't agree. We should capture the query parsing time as part of the request handling. You may take a look at PinotClientRequest.executeSqlQuery() (the rest resource to handle query) where we first parse the query, then send it to the request handler if it is DQL (there are also DML queries not handled by request handler). The request arrival time should be set to the time when the request arrives the broker.

I see. In that case, are we deprecating handleRequest() (and therefore BrokerRequestHandler)? Should PinotClientRequest be moved to spi?

@Jackie-Jiang
Copy link
Contributor

I see. In that case, are we deprecating handleRequest() (and therefore BrokerRequestHandler)? Should PinotClientRequest be moved to spi?

We are not deprecating it. PinotClientRequest is the rest query endpoint handler. The current contract is to set the request arrival time when RequestContext is created (that is outside of BrokerRequestHandler), and RequestContext is already in the spi.

@mcvsubbu
Copy link
Contributor Author

OK, modified as discussed offline and added comments so that we don't miss this in other re-factoring.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mcvsubbu mcvsubbu merged commit c603cd4 into apache:master Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants