Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record INFER_RESPONSE_COMPLETE only at the first callback #4176

Merged
merged 1 commit into from
Apr 7, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions src/grpc_server.cc
Original file line number Diff line number Diff line change
Expand Up @@ -4299,8 +4299,10 @@ ModelStreamInferHandler::StreamInferResponseComplete(
<< state->cb_count_ << ", flags " << flags;

#ifdef TRITON_ENABLE_TRACING
state->trace_timestamps_.emplace_back(std::make_pair(
"INFER_RESPONSE_COMPLETE", TraceManager::CaptureTimestamp()));
if (state->cb_count_ == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't really formulate how tracing will work on decoupled model? And if we are only recording it once, does it make sense to record on the last response complete?

Copy link
Contributor Author

@tanmayv25 tanmayv25 Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of INFER_RESPONSE_COMPLETE flag was to record when the callback was invoked to test our preserve_ordering logic by skipping the uncertainties added by our networking libraries.

Coming back to your question, you are correct that we have not formulated when to record INFER_RESPONSE_COMPLETE. There are three options:

  1. Record at the first callback: No need of mutex locks to protect trace_timestamps_. The other timestamps will be recorded after the first callback one by one.
  2. Record every callback: trace_timestamps_ needs protection that might affect performance. Without protection the server will seg fault as it is happening now.
  3. Record at the final callback: trace_timestamps_ will still need protection, as GRPC_SEND_START or GRPC_SEND_END of previous response might still be in the process of being recorded when this timestamp is added.

In short, by recording the timestamp just at the first response callback we are able to prevent mutex locks and I think user should not be concerned about INFER_RESPONSE_COMPLETE anyways. They might be more interested in GRPC_SEND_START, which in most likelihood should be very close to its respective INFER_RESPONSE_COMPLETE.

I can go with any other option if you would like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just curious why recording at the first callback doesn't require mutex locks. Is there a possibility of the second infer response arriving before the first infer response (maybe in the non-preserve ordering case)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state->cb_count_ is incremented in L4294. And for decoupled we require that only one response is sent at a time.
Hence, the first StreamInferResponseComplete call will increment the cb_count_ and no other responses for this state object are not into GRPC_SEND* cycle.

state->trace_timestamps_.emplace_back(std::make_pair(
"INFER_RESPONSE_COMPLETE", TraceManager::CaptureTimestamp()));
}
#endif // TRITON_ENABLE_TRACING

// Log appropriate errors
Expand Down