Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats/opentelemetry: add trace event for name resolution delay #7992

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

aranjans
Copy link
Contributor

@aranjans aranjans commented Jan 9, 2025

This PR adds a bonus feature to grfc A72 to emit a trace event for when the initial RPCs block on the name resolver.

RELEASE NOTES:

  • stats/opentelemetry: add trace event for name resolution delay.

@aranjans aranjans added this to the 1.70 Release milestone Jan 9, 2025
@aranjans aranjans added Type: Feature New features or improvements in behavior Area: Observability Includes Stats, Tracing, Channelz, Healthz, Binlog, Reflection, Admin, GCP Observability labels Jan 9, 2025
@aranjans aranjans force-pushed the a72_name_resolution_delay branch from c9eb817 to d9e5843 Compare January 9, 2025 03:53
Copy link

codecov bot commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 85.79545% with 25 lines in your changes missing coverage. Please review.

Project coverage is 82.16%. Comparing base (6f41085) to head (af9b139).

Files with missing lines Patch % Lines
stats/opentelemetry/trace.go 75.00% 9 Missing and 3 partials ⚠️
stats/opentelemetry/client_tracing.go 65.00% 5 Missing and 2 partials ⚠️
stats/opentelemetry/client_metrics.go 93.61% 2 Missing and 1 partial ⚠️
stats/opentelemetry/server_tracing.go 78.57% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7992      +/-   ##
==========================================
- Coverage   82.17%   82.16%   -0.01%     
==========================================
  Files         381      384       +3     
  Lines       38539    38679     +140     
==========================================
+ Hits        31668    31781     +113     
- Misses       5564     5587      +23     
- Partials     1307     1311       +4     
Files with missing lines Coverage Δ
clientconn.go 92.24% <100.00%> (+0.09%) ⬆️
...elemetry/experimental/grpc_trace_bin_propagator.go 87.50% <ø> (ø)
stats/opentelemetry/opentelemetry.go 76.51% <100.00%> (+0.64%) ⬆️
stats/opentelemetry/server_metrics.go 89.82% <100.00%> (+0.44%) ⬆️
stream.go 81.74% <100.00%> (-0.21%) ⬇️
stats/opentelemetry/client_metrics.go 88.23% <93.61%> (+0.30%) ⬆️
stats/opentelemetry/server_tracing.go 78.57% <78.57%> (ø)
stats/opentelemetry/client_tracing.go 65.00% <65.00%> (ø)
stats/opentelemetry/trace.go 75.00% <75.00%> (ø)

... and 19 files with indirect coverage changes

@aranjans aranjans force-pushed the a72_name_resolution_delay branch 2 times, most recently from 522379d to fc16e79 Compare January 9, 2025 04:09
@aranjans aranjans force-pushed the a72_name_resolution_delay branch from fc16e79 to af9b139 Compare January 9, 2025 05:47
@aranjans aranjans marked this pull request as ready for review January 9, 2025 06:01
@aranjans aranjans requested a review from purnesh42H January 9, 2025 06:01
@purnesh42H purnesh42H requested a review from dfawley January 9, 2025 06:42
Comment on lines +617 to +620
// nameResolutionStartTime track the start time since name resolution started.
nameResolutionStartTime time.Time
// nameResolutionInProgress indicate if name resolution is in progress.
nameResolutionInProgress bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want these to be fields on the ClientConn. These are supposed to be part of the RPC's lifecycle since that's what we're tracing.

@@ -674,13 +680,27 @@ func (cc *ClientConn) Connect() {
// context expires. Returns nil unless the context expires first; otherwise
// returns a status error based on the context.
func (cc *ClientConn) waitForResolvedAddrs(ctx context.Context) error {
// Set the start time for name resolution if it's not already set.
cc.mu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called on every RPC.

You're unconditionally setting this flag and timestamp at the start of every RPC. That can't be right.

@dfawley dfawley removed their assignment Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Observability Includes Stats, Tracing, Channelz, Healthz, Binlog, Reflection, Admin, GCP Observability Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants