Record INFER_RESPONSE_COMPLETE only at the first callback #4176

tanmayv25 · 2022-04-07T01:57:02Z

For decoupled model the map with the timestamps was getting corrupted as the callbacks can be recorded along with timestamps for GRPC_SEND_START and GRPC_SEND_END. Recording the only the first callback into the timestamp when tracing.

first = "INFER_RESPONSE_COMPLETE", second = 2271529015212328}, {first = "INFER_RESPONSE_COMPLETE", second = 2271529015285980}, {first = "GRPC_SEND_START", second = 2271529015386244}, {first = "", 
    second = 48}, {first = "", second = 435710686577}, {first = <error reading variable: Cannot create a lazy string with address 0x0, and a non-zero length.>, second = 140587214766080}, {
    first = "\260Z\000`\335\177\000\000P\213\000`\335\177\000\000\200\016\000\354\334\177\000\000\001\000\000\000\000\000\000\000\060", '\000' <repeats 23 times>, "\250\016\000\354\334\177", '\000' <repeats 26 times>, "\060\365\002\354\334\177\000\000\b\000\000\000\000\000\000\000\020\020\000\354\334\177\000\000\020\020\000\354\334\177\000\000\310\021\000\354\334\177\000\000H\365\002\354\334\177\000\000\030\021\000\354\334\177\000", second = 11}, {first = <error: Cannot access memory at address 0x65636e6575716573>, second = 64}, {first = "", second = 140586828959328}, {
    first = <error reading variable: Cannot create a lazy string with address 0x0, and a non-zero length.>, second = 140587170026752}, {
    first = "\311\303\220\363\017\036\372UH\211\345H\203\354\020H\211}\370H\213E\370\276\000\000\000\000H\211\307\350\337H\000\000\311\30

GuanLuo · 2022-04-07T02:14:11Z

src/grpc_server.cc

@@ -4299,8 +4299,10 @@ ModelStreamInferHandler::StreamInferResponseComplete(
                 << state->cb_count_ << ", flags " << flags;

 #ifdef TRITON_ENABLE_TRACING
-  state->trace_timestamps_.emplace_back(std::make_pair(
-      "INFER_RESPONSE_COMPLETE", TraceManager::CaptureTimestamp()));
+  if (state->cb_count_ == 1) {


I guess we don't really formulate how tracing will work on decoupled model? And if we are only recording it once, does it make sense to record on the last response complete?

The intent of INFER_RESPONSE_COMPLETE flag was to record when the callback was invoked to test our preserve_ordering logic by skipping the uncertainties added by our networking libraries.

Coming back to your question, you are correct that we have not formulated when to record INFER_RESPONSE_COMPLETE. There are three options:

Record at the first callback: No need of mutex locks to protect trace_timestamps_. The other timestamps will be recorded after the first callback one by one.

Record every callback: trace_timestamps_ needs protection that might affect performance. Without protection the server will seg fault as it is happening now.

Record at the final callback: trace_timestamps_ will still need protection, as GRPC_SEND_START or GRPC_SEND_END of previous response might still be in the process of being recorded when this timestamp is added.

In short, by recording the timestamp just at the first response callback we are able to prevent mutex locks and I think user should not be concerned about INFER_RESPONSE_COMPLETE anyways. They might be more interested in GRPC_SEND_START, which in most likelihood should be very close to its respective INFER_RESPONSE_COMPLETE.

I can go with any other option if you would like.

I was just curious why recording at the first callback doesn't require mutex locks. Is there a possibility of the second infer response arriving before the first infer response (maybe in the non-preserve ordering case)?

state->cb_count_ is incremented in L4294. And for decoupled we require that only one response is sent at a time.
Hence, the first StreamInferResponseComplete call will increment the cb_count_ and no other responses for this state object are not into GRPC_SEND* cycle.

Record INFER_RESPONSE_COMPLETE only at the first callback

89768a7

tanmayv25 requested review from GuanLuo, CoderHam, rmccorm4 and dyastremsky April 7, 2022 01:57

GuanLuo reviewed Apr 7, 2022

View reviewed changes

tanmayv25 requested a review from GuanLuo April 7, 2022 02:45

GuanLuo approved these changes Apr 7, 2022

View reviewed changes

tanmayv25 merged commit fa072e2 into main Apr 7, 2022

tanmayv25 deleted the tanmayv-ci-fix branch April 7, 2022 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record INFER_RESPONSE_COMPLETE only at the first callback #4176

Record INFER_RESPONSE_COMPLETE only at the first callback #4176

tanmayv25 commented Apr 7, 2022

GuanLuo Apr 7, 2022

tanmayv25 Apr 7, 2022 •

edited

Loading

Tabrizian Apr 7, 2022

tanmayv25 Apr 7, 2022

Record INFER_RESPONSE_COMPLETE only at the first callback #4176

Record INFER_RESPONSE_COMPLETE only at the first callback #4176

Conversation

tanmayv25 commented Apr 7, 2022

GuanLuo Apr 7, 2022

Choose a reason for hiding this comment

tanmayv25 Apr 7, 2022 • edited Loading

Choose a reason for hiding this comment

Tabrizian Apr 7, 2022

Choose a reason for hiding this comment

tanmayv25 Apr 7, 2022

Choose a reason for hiding this comment

tanmayv25 Apr 7, 2022 •

edited

Loading