Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trace endpoints. Add trace log frequency #3849

Merged
merged 16 commits into from
Feb 8, 2022
Merged

Add trace endpoints. Add trace log frequency #3849

merged 16 commits into from
Feb 8, 2022

Conversation

GuanLuo
Copy link
Contributor

@GuanLuo GuanLuo commented Jan 26, 2022

@deadeyegoodwin look-and-feel of the trace endpoints and the new trace configuration requested

docs/protocol/extension_trace.md Outdated Show resolved Hide resolved
docs/protocol/extension_trace.md Outdated Show resolved Hide resolved
docs/protocol/extension_trace.md Outdated Show resolved Hide resolved
docs/protocol/extension_trace.md Outdated Show resolved Hide resolved
docs/protocol/extension_trace.md Outdated Show resolved Hide resolved
@GuanLuo GuanLuo changed the title [WIP] Add trace endpoints. Add trace log frequency Add trace endpoints. Add trace log frequency Feb 2, 2022
@GuanLuo
Copy link
Contributor Author

GuanLuo commented Feb 2, 2022

The trace logic has been reworked to the following:

  • When a trace should be sampled, a Trace object is returned and it contains the Triton trace pointer (which will be transferred to Triton on calling InferAsync() as Triton API documented). The Trace object is owned by the server frontends and the frontends must ensure its lifecycle is longer than Triton trace's because the Triton trace will log Triton timestamps through the Trace object, which is doable as the frontends have been maintaining meta data associated with the requests (request's lifecycle includes Triton trace's lifecycle).
    • The frontends will log its own timestamps associated with the request via returned Trace object
  • The frontends will only interact with TraceManger for trace APIs, TraceSetting and TraceFile are abstraction to help TraceManger to keep track of the different model settings and versioned logging functionality.

docs/protocol/extension_trace.md Outdated Show resolved Hide resolved
src/servers/http_server.h Outdated Show resolved Hide resolved
TIMESTAMPS" to trace timestamps, "TENSORS" to trace tensors.
This value is an array of string whhere user may specify multiple levels to
trace multiple informations.
- "trace_rate" : the trace sampling rate. The value represents how many requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add "trace_count" here and also in the cmdline. trace_count defaults to 0 which means to continue collecting trace indefinitely. But it can be set to non-zero to cause trace_level to go to OFF after collecting trace_count traces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If trace_level is updated to OFF when trace_count exceeds, I assume the following will be true?

  1. Turning OFF global trace_level by trace_count does not cause any model specific setting to turn off its trace_level even if it wasn't specified for the model. In other words, the trace_level in model setting will be marked as "specified" once the global trace_level is affected by trace_count
  2. The user can turn trace_level back on with the update API. For global setting, due to the behavior in 1., this change will not be reflected to model setting unless the user "clear" the model trace_level first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to explain and implement if we don't actually change trace_level when trace_count is done? In the implementation we would trace only if trace_level was not OFF and trace_count > 0. We can use trace_count = -1 to mean unlimited trace_count.

Also be easier for user to just reset trace_count if they want to restart or change it (no need to also reset trace_level). Also, we can be consistent withe trace_count, when it is set in global and has not been specifically set in model then we can just set it in the model as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we use trace_count to directly indicate the remaining trace to be sampled for the particular setting, and a -1 value is used for no trace limit? Yes, that will be more clear to know why exactly the trace stops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we don't adjust trace_level ourselves.

TIMESTAMPS" to trace timestamps, "TENSORS" to trace tensors.
This value is an array of string whhere user may specify multiple levels to
trace multiple informations.
- "trace_rate" : the trace sampling rate. The value represents how many requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to explain and implement if we don't actually change trace_level when trace_count is done? In the implementation we would trace only if trace_level was not OFF and trace_count > 0. We can use trace_count = -1 to mean unlimited trace_count.

Also be easier for user to just reset trace_count if they want to restart or change it (no need to also reset trace_level). Also, we can be consistent withe trace_count, when it is set in global and has not been specifically set in model then we can just set it in the model as well.

@@ -476,8 +476,8 @@ std::vector<Option> options_
{OPTION_TRACE_COUNT, "trace-count", Option::ArgInt,
"Set the number of traces to be sampled. Once the specified number "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "Once the specified... to disable tracing"

@GuanLuo GuanLuo merged commit 4b23fe7 into main Feb 8, 2022
@GuanLuo GuanLuo deleted the gluo-trace branch February 8, 2022 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants