-
Notifications
You must be signed in to change notification settings - Fork 568
HDDS-13679. distributed tracing open telemetry improvement #9051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a design document for migrating Ozone's distributed tracing from the deprecated OpenTracing with Jaeger to OpenTelemetry. The document outlines the migration strategy, implementation details, and improvements to the tracing hierarchy to provide better end-to-end visibility across Ozone components.
Key changes:
- Addition of comprehensive OpenTelemetry design documentation
- Migration strategy from deprecated OpenTracing to OpenTelemetry
- Enhanced tracing hierarchy design for better flow visibility
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
### Integration of more flows | ||
For performance analysis and debugging, trace can be added for various flows, such as: | ||
|
||
- Datanode Heart Beat to SCM: need record trace only when Datanode initiate the trace context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error: 'need record trace' should be 'need to record trace' or 'records trace'.
- Datanode Heart Beat to SCM: need record trace only when Datanode initiate the trace context. | |
- Datanode Heart Beat to SCM: need to record trace only when Datanode initiates the trace context. |
Copilot uses AI. Check for mistakes.
For performance analysis and debugging, trace can be added for various flows, such as: | ||
|
||
- Datanode Heart Beat to SCM: need record trace only when Datanode initiate the trace context. | ||
- Recon: trace for all requests from Recon UI to Ozone components such as Recon Server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error: 'trace for all requests' should be 'tracing for all requests' or 'trace all requests'.
- Recon: trace for all requests from Recon UI to Ozone components such as Recon Server. | |
- Recon: trace all requests from Recon UI to Ozone components such as Recon Server. |
Copilot uses AI. Check for mistakes.
- Recon: trace for all requests from Recon UI to Ozone components such as Recon Server. | ||
- Internal services like OM, when connecting to SCM, can initiate a call flow under a timer thread. | ||
|
||
For ozone internal calls, trace should be initiated by caller as client span. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error: 'trace should be initiated by caller as client span' should be 'traces should be initiated by the caller as client spans'.
For ozone internal calls, trace should be initiated by caller as client span. | |
For ozone internal calls, traces should be initiated by the caller as client spans. |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to change the subject to "Design doc for OpenTelemetry integration"
What changes were proposed in this pull request?
Distributed tracing via Open telemetry and improvement design doc
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13679
How was this patch tested?