Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we don't really formulate how tracing will work on decoupled model? And if we are only recording it once, does it make sense to record on the last response complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent of INFER_RESPONSE_COMPLETE flag was to record when the callback was invoked to test our preserve_ordering logic by skipping the uncertainties added by our networking libraries.
Coming back to your question, you are correct that we have not formulated when to record INFER_RESPONSE_COMPLETE. There are three options:
In short, by recording the timestamp just at the first response callback we are able to prevent mutex locks and I think user should not be concerned about INFER_RESPONSE_COMPLETE anyways. They might be more interested in GRPC_SEND_START, which in most likelihood should be very close to its respective INFER_RESPONSE_COMPLETE.
I can go with any other option if you would like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just curious why recording at the first callback doesn't require mutex locks. Is there a possibility of the second infer response arriving before the first infer response (maybe in the non-preserve ordering case)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state->cb_count_
is incremented in L4294. And for decoupled we require that only one response is sent at a time.Hence, the first
StreamInferResponseComplete
call will increment the cb_count_ and no other responses for this state object are not into GRPC_SEND* cycle.