Skip to content

Adding OpenTelemetry Batch Span Processor #6842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 1, 2024

Conversation

oandreeva-nv
Copy link
Contributor

Fixes issue: #6693
Picked up work from this PR by @theoclark , he has CLA : #6733 (comment)

Currently, OpenTelemetry Trace API uses SimpleSpanProcessor, which send spans one by one, when they are ready. This impacts Triton's performance, when tracing is on and mode is opentelemetry. To mitigate this issue, SimpleSpanProcessor was replaced by a BatchSpanProcessor(https://opentelemetry.io/docs/specs/otel/trace/sdk/#batching-processor), which collects spans in batches of a specified size and sends them out to the endpoint periodically.

There are 3 customization parameters:
bsp_max_queue_size - the maximum queue size. After the size is reached spans are dropped. The default value is 2048.
bsp_schedule_delay - the maximum delay interval in milliseconds between two consecutive exports. The default value is 5000.
bsp_max_export_batch_size - the maximum batch size of every export. It must be smaller or equal to bsp_max_queue_size. If the queue reaches bsp_max_export_batch_size a batch will be exported even if bsp_schedule_delay milliseconds have not elapsed. The default value is 512.

Note, that specification has 4 parameters, but opentelemetry-cpp client only supports 3 at the moment.

In this PR:

  • I add an ability to specify bsp_max_queue_size, bsp_schedule_delay and bsp_max_export_batch_size through cmd line arguments. Alternatively, these parameters could also be specified from the environment variables.
  • I verify these arguments in command_line_parser and throw exception if argument is specified incorrectly.
  • I've also adjusted TraceConfig to store pairs, where second element can be one of std::string,int, uint32_t. Note, that cmdline parser will always collect strings of arguments' values. During the validation process I convert numerical values to integers and store integers instead of strings. This is done for multiple reasons: 1) We need to validate args 2) Since validation makes sure that StringToInt runs successfully, we can store converted value in trace_config_map and pass to TraceManager with valid parameters and proper types.
  • I've added docs
  • I've added tests

@rmccorm4
Copy link
Contributor

Currently, OpenTelemetry Trace API uses SimpleSpanProcessor, which send spans one by one, when they are ready. This impacts Triton's performance, when tracing is on and mode is opentelemetry.

If it's not too much work to check, is there a visible impact in PA throughput/latency for an example model with tracing ON vs OFF ?

@oandreeva-nv
Copy link
Contributor Author

@rmccorm4 I remember a user reported around x2 slowdown. If I don't find it, I'll run some experiments myself

@oandreeva-nv
Copy link
Contributor Author

From #6733 (comment):

We have observed slowdowns of up to 50% when running tritonserver with opentelemetry. We have traced this to the SimpleSpan Exporter and have observed these slowdowns go away when switching to the BatchSpanProcessor.

@@ -618,11 +618,140 @@ set -e
kill $SERVER_PID
wait $SERVER_PID

set +e
Copy link
Contributor

@rmccorm4 rmccorm4 Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random nit: but I also think it would help organization if L0_cmdline_trace was renamed to L0_trace_cmdline so they're adjacent in pipeline jobs - if you feel like tweaking that at some point, doesn't have to be in this PR.

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall! Most comments are just optional nits aside from a few questions in the tests.

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code and tests LGTM - last comments are just on docs: https://github.com/triton-inference-server/server/pull/6842/files#r1472390626

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
@oandreeva-nv oandreeva-nv requested a review from rmccorm4 February 1, 2024 00:42
oandreeva-nv and others added 2 commits January 31, 2024 17:19
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes!

@oandreeva-nv oandreeva-nv merged commit 8f98789 into main Feb 1, 2024
@oandreeva-nv oandreeva-nv deleted the oandreeva_otel_batch_processor branch February 1, 2024 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants