Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record INFER_RESPONSE_COMPLETE only at the first callback #4176

Merged
merged 1 commit into from
Apr 7, 2022

Conversation

tanmayv25
Copy link
Contributor

For decoupled model the map with the timestamps was getting corrupted as the callbacks can be recorded along with timestamps for GRPC_SEND_START and GRPC_SEND_END. Recording the only the first callback into the timestamp when tracing.

first = "INFER_RESPONSE_COMPLETE", second = 2271529015212328}, {first = "INFER_RESPONSE_COMPLETE", second = 2271529015285980}, {first = "GRPC_SEND_START", second = 2271529015386244}, {first = "", 
    second = 48}, {first = "", second = 435710686577}, {first = <error reading variable: Cannot create a lazy string with address 0x0, and a non-zero length.>, second = 140587214766080}, {
    first = "\260Z\000`\335\177\000\000P\213\000`\335\177\000\000\200\016\000\354\334\177\000\000\001\000\000\000\000\000\000\000\060", '\000' <repeats 23 times>, "\250\016\000\354\334\177", '\000' <repeats 26 times>, "\060\365\002\354\334\177\000\000\b\000\000\000\000\000\000\000\020\020\000\354\334\177\000\000\020\020\000\354\334\177\000\000\310\021\000\354\334\177\000\000H\365\002\354\334\177\000\000\030\021\000\354\334\177\000", second = 11}, {first = <error: Cannot access memory at address 0x65636e6575716573>, second = 64}, {first = "", second = 140586828959328}, {
    first = <error reading variable: Cannot create a lazy string with address 0x0, and a non-zero length.>, second = 140587170026752}, {
    first = "\311\303\220\363\017\036\372UH\211\345H\203\354\020H\211}\370H\213E\370\276\000\000\000\000H\211\307\350\337H\000\000\311\30

@@ -4299,8 +4299,10 @@ ModelStreamInferHandler::StreamInferResponseComplete(
<< state->cb_count_ << ", flags " << flags;

#ifdef TRITON_ENABLE_TRACING
state->trace_timestamps_.emplace_back(std::make_pair(
"INFER_RESPONSE_COMPLETE", TraceManager::CaptureTimestamp()));
if (state->cb_count_ == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't really formulate how tracing will work on decoupled model? And if we are only recording it once, does it make sense to record on the last response complete?

Copy link
Contributor Author

@tanmayv25 tanmayv25 Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of INFER_RESPONSE_COMPLETE flag was to record when the callback was invoked to test our preserve_ordering logic by skipping the uncertainties added by our networking libraries.

Coming back to your question, you are correct that we have not formulated when to record INFER_RESPONSE_COMPLETE. There are three options:

  1. Record at the first callback: No need of mutex locks to protect trace_timestamps_. The other timestamps will be recorded after the first callback one by one.
  2. Record every callback: trace_timestamps_ needs protection that might affect performance. Without protection the server will seg fault as it is happening now.
  3. Record at the final callback: trace_timestamps_ will still need protection, as GRPC_SEND_START or GRPC_SEND_END of previous response might still be in the process of being recorded when this timestamp is added.

In short, by recording the timestamp just at the first response callback we are able to prevent mutex locks and I think user should not be concerned about INFER_RESPONSE_COMPLETE anyways. They might be more interested in GRPC_SEND_START, which in most likelihood should be very close to its respective INFER_RESPONSE_COMPLETE.

I can go with any other option if you would like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just curious why recording at the first callback doesn't require mutex locks. Is there a possibility of the second infer response arriving before the first infer response (maybe in the non-preserve ordering case)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state->cb_count_ is incremented in L4294. And for decoupled we require that only one response is sent at a time.
Hence, the first StreamInferResponseComplete call will increment the cb_count_ and no other responses for this state object are not into GRPC_SEND* cycle.

@tanmayv25 tanmayv25 requested a review from GuanLuo April 7, 2022 02:45
@tanmayv25 tanmayv25 merged commit fa072e2 into main Apr 7, 2022
@tanmayv25 tanmayv25 deleted the tanmayv-ci-fix branch April 7, 2022 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants