-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record INFER_RESPONSE_COMPLETE only at the first callback #4176
Conversation
@@ -4299,8 +4299,10 @@ ModelStreamInferHandler::StreamInferResponseComplete( | |||
<< state->cb_count_ << ", flags " << flags; | |||
|
|||
#ifdef TRITON_ENABLE_TRACING | |||
state->trace_timestamps_.emplace_back(std::make_pair( | |||
"INFER_RESPONSE_COMPLETE", TraceManager::CaptureTimestamp())); | |||
if (state->cb_count_ == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we don't really formulate how tracing will work on decoupled model? And if we are only recording it once, does it make sense to record on the last response complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent of INFER_RESPONSE_COMPLETE flag was to record when the callback was invoked to test our preserve_ordering logic by skipping the uncertainties added by our networking libraries.
Coming back to your question, you are correct that we have not formulated when to record INFER_RESPONSE_COMPLETE. There are three options:
- Record at the first callback: No need of mutex locks to protect trace_timestamps_. The other timestamps will be recorded after the first callback one by one.
- Record every callback: trace_timestamps_ needs protection that might affect performance. Without protection the server will seg fault as it is happening now.
- Record at the final callback: trace_timestamps_ will still need protection, as GRPC_SEND_START or GRPC_SEND_END of previous response might still be in the process of being recorded when this timestamp is added.
In short, by recording the timestamp just at the first response callback we are able to prevent mutex locks and I think user should not be concerned about INFER_RESPONSE_COMPLETE anyways. They might be more interested in GRPC_SEND_START, which in most likelihood should be very close to its respective INFER_RESPONSE_COMPLETE.
I can go with any other option if you would like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just curious why recording at the first callback doesn't require mutex locks. Is there a possibility of the second infer response arriving before the first infer response (maybe in the non-preserve ordering case)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state->cb_count_
is incremented in L4294. And for decoupled we require that only one response is sent at a time.
Hence, the first StreamInferResponseComplete
call will increment the cb_count_ and no other responses for this state object are not into GRPC_SEND* cycle.
For decoupled model the map with the timestamps was getting corrupted as the callbacks can be recorded along with timestamps for GRPC_SEND_START and GRPC_SEND_END. Recording the only the first callback into the timestamp when tracing.