-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Adding OpenTelemetry Batch Span Processor #6842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If it's not too much work to check, is there a visible impact in PA throughput/latency for an example model with tracing ON vs OFF ? |
@rmccorm4 I remember a user reported around x2 slowdown. If I don't find it, I'll run some experiments myself |
From #6733 (comment):
|
@@ -618,11 +618,140 @@ set -e | |||
kill $SERVER_PID | |||
wait $SERVER_PID | |||
|
|||
set +e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random nit: but I also think it would help organization if L0_cmdline_trace
was renamed to L0_trace_cmdline
so they're adjacent in pipeline jobs - if you feel like tweaking that at some point, doesn't have to be in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall! Most comments are just optional nits aside from a few questions in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code and tests LGTM - last comments are just on docs: https://github.com/triton-inference-server/server/pull/6842/files#r1472390626
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes!
Fixes issue: #6693
Picked up work from this PR by @theoclark , he has CLA : #6733 (comment)
Currently, OpenTelemetry Trace API uses SimpleSpanProcessor, which send spans one by one, when they are ready. This impacts Triton's performance, when tracing is on and mode is
opentelemetry
. To mitigate this issue, SimpleSpanProcessor was replaced by a BatchSpanProcessor(https://opentelemetry.io/docs/specs/otel/trace/sdk/#batching-processor), which collects spans in batches of a specified size and sends them out to the endpoint periodically.There are 3 customization parameters:
bsp_max_queue_size
- the maximum queue size. After the size is reached spans are dropped. The default value is 2048.bsp_schedule_delay
- the maximum delay interval in milliseconds between two consecutive exports. The default value is 5000.bsp_max_export_batch_size
- the maximum batch size of every export. It must be smaller or equal tobsp_max_queue_size
. If the queue reachesbsp_max_export_batch_size
a batch will be exported even ifbsp_schedule_delay
milliseconds have not elapsed. The default value is 512.Note, that specification has 4 parameters, but opentelemetry-cpp client only supports 3 at the moment.
In this PR:
bsp_max_queue_size
,bsp_schedule_delay
andbsp_max_export_batch_size
through cmd line arguments. Alternatively, these parameters could also be specified from the environment variables.command_line_parser
and throw exception if argument is specified incorrectly.TraceConfig
to store pairs, where second element can be one ofstd::string
,int
,uint32_t
. Note, that cmdline parser will always collect strings of arguments' values. During the validation process I convert numerical values to integers and store integers instead of strings. This is done for multiple reasons: 1) We need to validate args 2) Since validation makes sure thatStringToInt
runs successfully, we can store converted value intrace_config_map
and pass to TraceManager with valid parameters and proper types.