Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats/internal: OpenTelemetry tracing GRPCTraceBinPropagator #7677

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Sep 27, 2024

This PR adds the GRPCTraceBinPropagator that implements TextMapPropagator as proposed in gRFC A72 which outlines a custom propagator to handle both binary and text formats for trace context propagation. This will allow gRPC to keep using grpc-trace-bin header for context propagation and also support any other text propagators. When using grpc-trace-bin the OpenCensus spanContext and OpenTelemetry spanContext are identical, therefore a gRPC OpenCensus client can speak with a gRPC OpenTelemetry server and vice versa. It is encouraged to use GRPCTraceBinPropagator for the migration. Using the same header greatly simplifies rollout.

Note: Currently, the GRPCTraceBinPropagator is placed within the internal package. We plan to move it out of internal and make it publicly accessible as part of a future OpenTelemetry API change.

RELEASE NOTES: None

Copy link

codecov bot commented Sep 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.97%. Comparing base (b8ee37d) to head (80820c4).
Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7677      +/-   ##
==========================================
+ Coverage   81.84%   81.97%   +0.13%     
==========================================
  Files         361      362       +1     
  Lines       27827    28119     +292     
==========================================
+ Hits        22775    23051     +276     
- Misses       3850     3860      +10     
- Partials     1202     1208       +6     

see 21 files with indirect coverage changes

@purnesh42H purnesh42H force-pushed the otel-tracing-grpc-trace-bin-propagator branch from 92de02a to 9a9336b Compare September 27, 2024 17:21
@purnesh42H purnesh42H force-pushed the otel-tracing-grpc-trace-bin-propagator branch 2 times, most recently from d0ddcc0 to 5e524a8 Compare September 30, 2024 08:27
@purnesh42H purnesh42H added the Type: Feature New features or improvements in behavior label Sep 30, 2024
@purnesh42H purnesh42H added this to the 1.68 Release milestone Sep 30, 2024
@purnesh42H purnesh42H added the Area: Observability Includes Stats, Tracing, Channelz, Healthz, Binlog, Reflection, Admin, GCP Observability label Sep 30, 2024
@dfawley dfawley removed their assignment Oct 1, 2024
@dfawley
Copy link
Member

dfawley commented Oct 1, 2024

I'll let @zasweq do the primary review here and just take a 2nd pass.

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on implementation; didn't review tests yet. My next pass will be more on correctness, this pass was more general Go style/API and docstring semantics.


"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/trace"
otelinternaltracing "google.golang.org/grpc/stats/opentelemetry/internal/tracing"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rename is quite long. Also please separate imports: https://github.com/grpc/grpc-go/blob/master/stats/opentelemetry/client_metrics.go#L31. Perhaps otelpropagation, oteltrace, itracing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested except internaltracing instead of itracing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer itracing, as it's more terse and it's the only way I've seen this. Is there places in our codebase where we rename internal/x imports to internalx? If so I'm fine with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i saw convention is i* like istats. Changed to itracing

Comment on lines 32 to 34
// GRPCTraceBinPropagator is TextMapPropagator to propagate cross-cutting
// concerns as both text and binary key-value pairs within a carrier that
// travels in-band across process boundaries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere: I don't know what these comments semantically refer to. What does "cross-cutting concerns" mean? I don't think this is a strict requirement "within a carrier that travels in-band across process boundaries."

Copy link
Contributor Author

@purnesh42H purnesh42H Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both "cross-cutting concerns" and "within a carrier that travels in-band across process boundaries." are taken directly from OpenTelemetry propagation.TextMapCarrier interface https://pkg.go.dev/go.opentelemetry.io/otel/propagation#TextMapPropagator.

"cross-cutting concerns" simply refers to functionalities that are not central to the core logic of your application but are needed across different parts of it (eg. tracing, logging etc.) which are transported "in-band" (within the main data flow) across process boundaries (separate applications).

However, you are right to point out that we should keep the docstrings simplified for anyone to understand the purpose of the method. I have simplified the comments everwhere now. Let me know what you think.

// travels in-band across process boundaries.
type GRPCTraceBinPropagator struct{}

// Inject set cross-cutting concerns from the Context into the carrier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well, what is cross-cutting concerns?

stats/opentelemetry/internal/grpc_trace_bin_propagator.go Outdated Show resolved Hide resolved
stats/opentelemetry/internal/grpc_trace_bin_propagator.go Outdated Show resolved Hide resolved
stats/opentelemetry/internal/tracing/custom_carrier.go Outdated Show resolved Hide resolved
stats/opentelemetry/internal/tracing/custom_carrier.go Outdated Show resolved Hide resolved
func (c CustomCarrier) GetBinary() ([]byte, error) {
values := stats.Trace(*c.Ctx)
if len(values) == 0 {
return nil, errors.New("`grpc-trace-bin` header not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct error message? I think both in the case it's not present in md and if it's set to an empty byte string it'll be not found. Is an empty byte string distinct from nil in the return? Is the empty byte string a valid thing to transmit for this header? Do we want this to return an error at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed returning error from here to be consistent with Get(..)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the error. What is the intended usage of this function? Do you intended to treat not set and set to an empty string equivalent? Is empty string a valid key for this? In that case you should make the error message fail with that. But that relates to the intended usage of this, and the failure modes (i.e. what happens in the error case).

Copy link
Contributor Author

@purnesh42H purnesh42H Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are only looking trace context which needs to conform to following struct. So the propagator will not allow to set any such value (like empty) as it won't be a valid SpanContext. Hence, the only reason for returning nil is client didn't set the grpc-trace-bin header.

// Fields:
        //
        // TraceId: (field_id = 0, len = 16, default = "0000000000000000") -    
        // 16-byte array representing the trace_id.
        //
        // SpanId: (field_id = 1, len = 8, default = "00000000") - 8-byte array   
        // representing the span_id.
        //
        // TraceOptions: (field_id = 2, len = 1, default = "0") - 1-byte array 
           representing the trace_options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is empty string a valid key for this?

Binary methods will not deal with string keys because we are using existing stats package

func SetTrace(ctx context.Context, b []byte) context.Context {
to set bytes which sets a custom struct key


// SetBinary sets the binary value to the gRPC context, which will be sent in
// the outgoing RPC with the header grpc-trace-bin.
func (c CustomCarrier) SetBinary(value []byte) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the point of this API. It seems to do things by deferring to operations on a context either to the stats package or the metadata package? What is the function of this API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this requires a bit more background on migration path from opencensus to opentelemetry (see A72).

The design proposes that while gRPC OpenCensus directly interacts with metadata API, gRPC Open Telemetry will use standardized https://pkg.go.dev/go.opentelemetry.io/otel/propagation package for context propagation by encoding them in metadata for the following benefits:

  • Full integration with OpenTelemetry APIs that is easier for users to reason about.
  • Make it possible to plugin other propagators that the community supports.
  • Flexible API that allows clean and simple migration paths to a different propagator.

As of today, OpenTelemetry propagator API only supports https://pkg.go.dev/go.opentelemetry.io/otel/propagation#TextMapPropagator, that is to send string key/value pairs between the client and server, which is different from the binary header that gRPC currently uses. The future roadmap to support binary propagators at OpenTelemetry is unclear. So, gRPC will use propagator API in TextMap format with optimization path to work around the binary propagator API. Once the Open Telemetry binary propagator API is available in the future, we can continuously integrate with those API with little effort.

Therefore, we need a custom implementation for the Carrier that supports both binary and text values. For binary header grpc-trace-bin we are reusing the existing stats package to conform with existing grpc design.

*c.Ctx = stats.SetTrace(*c.Ctx, value)
}

// Keys lists the keys stored in the gRPC context for the outgoing RPC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't list, it returns a slice of keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

@zasweq zasweq assigned purnesh42H and unassigned zasweq Oct 2, 2024
@purnesh42H purnesh42H requested a review from zasweq October 6, 2024 18:22
@purnesh42H purnesh42H assigned zasweq and unassigned purnesh42H Oct 6, 2024
Comment on lines +53 to +54
// context. It returns an empty string if the key is not present in the
// context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also returns empty string if it's set in the context but has no values. But I don't know if this is the right behavior, should we return an error? What is the intended usage of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I replied above the reason why empty value is not possible. We check if the SpanContext is valid in propagator's Inject() method before setting in carrier

span := oteltrace.SpanFromContext(ctx)
	if !span.SpanContext().IsValid() {
		return
	}
	```

return ctx
}

spanContext, ok := FromBinary([]byte(binaryData))
spanContext, ok := FromBinary([]byte(bd))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch spanContext to sc please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 89 to 93
// Fields always returns a slice containing only `grpc-trace-bin` header key
// because the GRPCTraceBinPropagator only uses the 'grpc-trace-bin' header for
// propagating trace context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rewrite this comment to say what Fields() returns from a conceptual point of view. What does Fields mean? Fields of what? Why is it returning a key? Or rename the function to be semantically closer to the intended idea of what this represents.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Basically the propagation.TextMapCarrier interface which GRPCTraceBinPropagator implements has Keys()" your reply to another comment - should we call this Keys() instead?

Copy link
Contributor Author

@purnesh42H purnesh42H Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GRPCTraceBinPropagator is implementing https://github.com/open-telemetry/opentelemetry-go/blob/main/propagation/propagation.go#L82 which has method Fields which is what i had my previous comment.

// Fields returns the keys whose values are set with Inject.
Fields() []string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included // Fields returns the keys whose values are set with Inject. as well in docstring

Comment on lines 54 to 83
t.Run("Fast path with CustomCarrier", func(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
carrier := internaltracing.NewCustomCarrier(metadata.NewOutgoingContext(ctx, metadata.MD{}))
propagator.Inject(traceCtx, carrier)

got := stats.OutgoingTrace(*carrier.Context())
want := Binary(spanContext)
if string(got) != string(want) {
t.Fatalf("got = %v, want %v", got, want)
}
cancel()
})

t.Run("Slow path with TextMapCarrier", func(t *testing.T) {
carrier := otelpropagation.MapCarrier{}
propagator.Inject(traceCtx, carrier)

got := carrier.Get(internaltracing.GRPCTraceBinHeaderKey)
want := base64.StdEncoding.EncodeToString(Binary(spanContext))
if got != want {
t.Fatalf("got = %v, want %v", got, want)
}
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i had tabular test before but both tests have different veriffication logic so I wrote in this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i had them as tabular test before but verification logic is significantly different because of binary and string format so i wrote it this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw there are some existing tests written in similar way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can declare the want in the t-test as a variable.

Copy link
Contributor Author

@purnesh42H purnesh42H Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see logic to retrieve got is different as well. For fast path which will use SetBinary we get it through stats package because we are using CustomCarrier where as for slow path which will use Set, we just retrieve directly using carrier's Get because we are using a different carrier (not implemented by us). Would you prefer to have separate tests altogether like TestInject_FastPath, TestInject_SlowPath?

Comment on lines 45 to 52
propagator := GRPCTraceBinPropagator{}
spanContext := oteltrace.NewSpanContext(oteltrace.SpanContextConfig{
TraceID: [16]byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16},
SpanID: [8]byte{17, 18, 19, 20, 21, 22, 23, 24},
TraceFlags: oteltrace.FlagsSampled,
})
traceCtx, traceCancel := context.WithCancel(context.Background())
traceCtx = oteltrace.ContextWithSpanContext(traceCtx, spanContext)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable names too long for their scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all vars to short names

SpanID: [8]byte{17, 18, 19, 20, 21, 22, 23, 24},
TraceFlags: oteltrace.FlagsSampled,
})
traceCtx, traceCancel := context.WithCancel(context.Background())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer cancel() here is preferred over calling it at the operation at the end. Defer is preferred because what happens if your test fails before it hits the last line/operation. It should still run the cleanup in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

grpctest.RunSubTests(t, s{})
}

func (s) TestInject(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top level comment please explaining what this test does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

traceCancel()
}

func (s) TestExtract(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top level comment here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
cancel()
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments on this test as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +84 to +97
TraceID: [16]byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16},
SpanID: [8]byte{17, 18, 19, 20, 21, 22, 23, 24},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for 16/8 here? Can we just remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to strictly adhere to OpenTelemetry SpanContext struct fields

// Fields:
        //
        // TraceId: (field_id = 0, len = 16, default = "0000000000000000") -    
        // 16-byte array representing the trace_id.
        //
        // SpanId: (field_id = 1, len = 8, default = "00000000") - 8-byte array   
        // representing the span_id.
        //
        // TraceOptions: (field_id = 2, len = 1, default = "0") - 1-byte array 
           representing the trace_options.

Comment on lines 42 to 44
if len(a) != len(b) {
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could sort the slices, compare length and then iterate through one and compare it to string at others index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's an alternate way to compare two slices but do you see any major concern with count frequency approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't cmp.Equal/Diff just do this for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean use cmp.Equal after sorting? Because we want to ignore the order for this comparison

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

},
}

for _, tt := range tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tt/test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zasweq zasweq assigned purnesh42H and unassigned zasweq Oct 7, 2024
@purnesh42H purnesh42H force-pushed the otel-tracing-grpc-trace-bin-propagator branch from 05ec5f3 to f9209c3 Compare October 9, 2024 15:33
@purnesh42H purnesh42H requested a review from zasweq October 9, 2024 15:41
@purnesh42H purnesh42H assigned zasweq and unassigned purnesh42H Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Observability Includes Stats, Tracing, Channelz, Healthz, Binlog, Reflection, Admin, GCP Observability Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants