Skip to content

Conversation

sumitagrawl
Copy link
Contributor

What changes were proposed in this pull request?

Distributed tracing via Open telemetry and improvement design doc

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13679

How was this patch tested?

  • NA

@sumitagrawl sumitagrawl changed the title HDDS-13679: distributed tracing open telemetry improvement HDDS-13679. distributed tracing open telemetry improvement Sep 19, 2025
@sumitagrawl sumitagrawl requested a review from Copilot September 22, 2025 15:51
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a design document for migrating Ozone's distributed tracing from the deprecated OpenTracing with Jaeger to OpenTelemetry. The document outlines the migration strategy, implementation details, and improvements to the tracing hierarchy to provide better end-to-end visibility across Ozone components.

Key changes:

  • Addition of comprehensive OpenTelemetry design documentation
  • Migration strategy from deprecated OpenTracing to OpenTelemetry
  • Enhanced tracing hierarchy design for better flow visibility

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

### Integration of more flows
For performance analysis and debugging, trace can be added for various flows, such as:

- Datanode Heart Beat to SCM: need record trace only when Datanode initiate the trace context.
Copy link
Preview

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error: 'need record trace' should be 'need to record trace' or 'records trace'.

Suggested change
- Datanode Heart Beat to SCM: need record trace only when Datanode initiate the trace context.
- Datanode Heart Beat to SCM: need to record trace only when Datanode initiates the trace context.

Copilot uses AI. Check for mistakes.

For performance analysis and debugging, trace can be added for various flows, such as:

- Datanode Heart Beat to SCM: need record trace only when Datanode initiate the trace context.
- Recon: trace for all requests from Recon UI to Ozone components such as Recon Server.
Copy link
Preview

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error: 'trace for all requests' should be 'tracing for all requests' or 'trace all requests'.

Suggested change
- Recon: trace for all requests from Recon UI to Ozone components such as Recon Server.
- Recon: trace all requests from Recon UI to Ozone components such as Recon Server.

Copilot uses AI. Check for mistakes.

- Recon: trace for all requests from Recon UI to Ozone components such as Recon Server.
- Internal services like OM, when connecting to SCM, can initiate a call flow under a timer thread.

For ozone internal calls, trace should be initiated by caller as client span.
Copy link
Preview

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error: 'trace should be initiated by caller as client span' should be 'traces should be initiated by the caller as client spans'.

Suggested change
For ozone internal calls, trace should be initiated by caller as client span.
For ozone internal calls, traces should be initiated by the caller as client spans.

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to change the subject to "Design doc for OpenTelemetry integration"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants