feat: GRPC instrumentation #854

ahayworth · 2021-06-30T22:19:59Z

This PR adds GRPC auto-instrumentation. The following features are implemented:

Client / Server "unary" RPCs (single request/response)
Client / Server "server streaming" RPCs (single resquest / stream of responses)
Client / Server "client streaming" RPCs (stream of requests / single response)
Client / Server "bi-directional streaming" RPCs (stream of requests / stream of responses)
Log events for individual sent / received messages during a stream's lifecycle

The following features are not yet implemented, but will be finished in this PR in short order:

Tests, of any kind
net.peer.* and net.transport semantic convention attributes
Span error handling (this might actually be working, but I haven't tested it yet)

I might not implement:

message.compressed_size and message.uncompressed_size semantic convention attributes. I think I might be able to get to those sizes, but I'm not completely certain. They are not listed as a MUST in the spec, so I think it might be okay to leave out.

Things of note to reviewers:

As stated before, I'm not completely done here. But if you have feedback or any thoughts, they are more than welcome at this point.
I initially tried to use client / server interceptors, but they only handle the beginning of the calls and are thus unsuited for the task of instrumenting a long-running call.
Clients using the ruby GRPC libraries can invoke a service passing the return_op: true flag ... and then they get back a version of an ActiveCall that can actually be invoked later. So we don't necessarily know that the RPC call will take place when ClientStub#request_response is called, for instance. That leads to some annoying duplication throughout, and passing spans around as instance attributes on ActiveCall / BidiCall objects.
If a client does not pass a block to a streaming call, they'll receive an enumerable that can be called later to lazily read in the actual responses. This causes additional annoyances because we don't really know what spans may be active in the context when we actually start to iterate through the responses - another reason we pass the spans around as instance variables.

I'm not sure that this approach is a good one, but it feels okay to me. I'm worried primarily that we won't clean up spans correctly (what if a client never enumerates their results?) but I'm not sure I see another way to do it while still adhering to the GRPC event spec requirements.

Expand me for a bunch of screenshots

Fixes open-telemetry/opentelemetry-ruby-contrib#10
Fixes #96

instrumentation/grpc/lib/opentelemetry/instrumentation/grpc/patches/active_call.rb

ahayworth · 2021-08-04T16:53:55Z

Behold, I am alive!

I think this PR is ready for testing and review. Sadly I don't have much of a real application to test this with. If anyone does, I'd greatly appreciate it!

instrumentation/grpc/Gemfile

arielvalentin · 2021-08-05T16:32:30Z

instrumentation/grpc/lib/opentelemetry/instrumentation/grpc/patches/active_call.rb

+          # on the client side, it's a little harder in the accept loop on the server. So, we provide
+          # a way for server-side ActiveCalls to set the span after the ActiveCall is created.
+          def span=(span)
+            @span = span


I am unfamiliar with the execution model here. Does this require a threadsafe writer?

I don't think so. I will admit that fundamentally I am not 100% sure. This ActiveCall setter is only called in two places:

when the ActiveCall is initialized (which sets the "correct" span for client side things using the ActiveCall class)

when we start to handle a call on the server-side - just after it is dispatched for processing, but technically after the ActiveCall was already created.

The main reason I added it was because on the server-side, I couldn't find a great way to cleanly patch the accept loop that handles and schedules all of the server methods without re-implementing a lot of that code myself. That felt like a great way to introduce bugs to me 😆.

The "accept loop" (so to speak) is at RpcServer#loop_handle_server_calls - and the bit I did choose to patch was RpcDesc#run_server_method, which is called from within RpcServer#loop_handle_server_calls. If you take a peek at the handling code, you can probably see why I chose to do it this way. However, a fresh set of eyes would be appreciated sometime, because I may well have overlooked the easy way to do this 🤣

Anyways - back to the thread safety question - the reason I think this is okay is because we only ever change this variable in one place, and that specifically is when we dispatch the server-side ActiveCall to a thread for processing. A different ActiveCall is created for each request coming in, and my read of the work-processing system within the GRPC code is that requests are explicitly scheduled with their context to a single thread, rather than multiple threads potentially pulling the same work item or something. I don't immediately see a way in which there could be concurrent access to this setter given the scope and the overall way that the GRPC library is doing things.

That said ... the GRPC library is rather confusing, so I very much could be wrong there. I'm going to say I'm about 95% certain we are fine here, in my opinion.

arielvalentin · 2021-08-05T16:45:32Z

instrumentation/grpc/lib/opentelemetry/instrumentation/grpc/patches/core_call.rb

+          # handle the log message requirements of the OpenTelemetry semantic conventions.
+          def run_batch(ops)
+            if ops.keys.include?(::GRPC::Core::CallOps::SEND_MESSAGE)
+              OpenTelemetry::Trace.current_span.add_event(


I wish I had a better grasp of the execution model here. I keep thinking back to the way that the Rack instrumentation keeps track of its own Span instead of using the "current" span in case users add custom interceptors that generate their own spans and I want to make sure the gRPC span is the one that is augmented with the event.

I wish I had a better grasp of the execution model here.

Me too, honestly. 🤣 This one is rough, for sure.

I keep thinking back to the way that the Rack instrumentation keeps track of its own Span instead of using the "current" span in case users add custom interceptors that generate their own spans and I want to make sure the gRPC span is the one that is augmented with the event.

I think this is an excellent concern. I'm going to try and refactor it along those lines. I think the scenario you described could happen if someone wrote their own interceptor, and that would definitely be a really upsetting / embarrassing bug for us. Not the first one I'd have written, either. 🤦

I think what really makes this hard is that the execution model of the official GRPC library is just all over the place - and the bit we're discussing here is actually a C extension ... and a lot of work happens in that C extension, which just makes it all harder.

That said, I think I see a way to get to the state we want. 👍

arielvalentin · 2021-08-05T16:49:29Z

instrumentation/grpc/lib/opentelemetry/instrumentation/grpc/patches/rpc_desc.rb

+            # The server's ActiveCall is created during the accept loop, and so the
+            # current_span saved off when created will likely be a non-recording API
+            # span. So we replace the saved-off span with whatever we just created.
+            active_call.span = span


Something I do not quite grok is why this does not rely on current_span since it is passed in to with_span on L32?

You know - at this point it's been so long since I wrote it that I don't precisely recall why. I suspect it's something like what you were talking about in another area - I wanted to make sure we're augmenting the correct span, so the stuff inside the ActiveCall patches all operate on the @span variable instead of current_span. But; there are places (the c extension, etc) where getting at the current ActiveCall isn't possible so I then need to use current_span ... and it gets to be a mess.

Like I said elsewhere, I'm going to try harder to encapsulate "our spans" more cleanly, and then as part of that refactor I'll try to make all of this stuff a lot more clear somehow.

We should also look at other SDK implementations as well and see if the generated code looks similar to see how they are handling it. That may help us puzzle through some options.

ericmustin · 2021-08-05T18:07:56Z

instrumentation/grpc/lib/opentelemetry/instrumentation/grpc/patches/rpc_desc.rb

+            attrs = {
+              'rpc.system' => 'grpc',
+              'rpc.service' => mth.owner.service_name,
+              'rpc.method' => mth.name.to_s


small nit (was working on something related and figured i'd see how this was implemented, havent done a full review)

I think semantic conventions, in addition to some of the above, require either net.peer.ip or net.peer.name: https://github.com/open-telemetry/opentelemetry-specification/blob/87a5ed7f0d4c403e2b336f275ce3e7fd66a8041b/specification/metrics/semantic_conventions/rpc.md#attributes

Ah, you're right. It's even on the TODO list for the PR and I neglected to do it. 😆

ericmustin · 2021-08-05T18:09:16Z

instrumentation/grpc/lib/opentelemetry/instrumentation/grpc/patches/client_stub.rb

+              'rpc.system' => 'grpc',
+              'rpc.service' => service_name.to_s,
+              'rpc.method' => method_name.to_s


same as https://github.com/open-telemetry/opentelemetry-ruby/pull/854/files#r683679304

Yup - I need to do that still. I think I punted on it because it looked rather hard to figure out, but I suspect we can figure it out. 😄

ahayworth · 2021-08-06T14:17:49Z

Putting this back into draft; the comments from @arielvalentin and @ericmustin were pretty helpful - and I need to do a major revision to try to get this ready to merge. 😄 Thanks!

jimeh · 2021-11-02T18:30:02Z

What's progress on this looking like? How much is left to get it across the line?

I've got a quick and dirty GRPC interceptor implementation working within a project, but coming up against some limitations with how the GRPC interceptors are implemented in Ruby. Which is why I assume you're not using them here.

I'd be happy to spare a couple of evenings to try and help out if it's of any use :)

And if it helps, regarding message.uncompressed_size, you should be able to get that by simply calling .to_proto.size on GRPC request and response objects.

ahayworth · 2021-11-03T17:09:58Z

@jimeh it's definitely stalled, my day job no longer requires ruby and I don't have as much free and clear time to devote to otel-ruby :(

The general status was, however:

Needs to be cleaned up overall. The flow is confusing and I think it's likely incorrect in places.
Needs testing. Even when I wrote this, I didn't have a GRPC ruby app that I could do much testing with.

I'd be happy to work on it with you, or for you to take over the branch if you have the motivation to do so!

jimeh · 2021-11-09T22:00:42Z

@ahayworth Thanks for the info :)

I've actually ended up taking a quite different approach in an attempt to quickly hack something together for my day job that just about works. Turned out to not be that quick, but I think I have it all working now with message events, error handling, etc. It lacks automated tests at the moment though, so no there's no guarantees around those promises... lol

I'll try and get things cleaned up and organized so I can put together a draft PR here sometime soon. Any help you're able and willing to offer at that point @ahayworth would be greatly appreciated :)

As for the details of my implementation; It's all operating through gRPC client and server interceptors, which means a smaller footprint of monkey-patched / prepended classes, so it's a bit easier to follow how it works (I think/hope). However, due to the limitations of interceptors, the bulk of the tracing is done by monkey-patching methods on individual instances of objects exposed to interceptors. Both long-lived and per-request objects at that. It's kind of horrible, but yet somewhat elegant, and reasonably easy to follow the logical flow of... I think :P

ahayworth · 2021-11-09T23:15:54Z

@jimeh I look forward to the PR, that's awesome!

robertlaurin · 2022-01-19T16:10:50Z

Hey @jimeh, did you still have any time/bandwidth to share a draft with us? We're looking get some GRPC instrumentation out, and I'd be interested to see what your approach looks like.

ahayworth · 2022-02-01T17:58:45Z

I'm going to go ahead and close this out - it's unlikely that I'll be able to work on it now that my day job does not involve ruby. However, hopefully it's instructive for a future attempt!

jimeh · 2022-02-15T10:50:59Z

@robertlaurin Apologies for the silence, it's been a busy few months, and my focus has actually ended up being on the Golang side of our project that uses gRPC and otel. But I'm planning to put aside a couple of evenings this week or next to extract the Ruby implementation I wrote to get a draft PR submitted. After that, next steps are tests, and fixing a bug where message events are sometimes assigned to the wrong span during high volumes of parallel gRPC requests.

@ahayworth Thanks for your efforts on this PR. It was a really valuable resource for me end of last year when I went about my "quick and dirty hack" implementation. I hope you won't mind if I poke you when I open the above mentioned PR within the next week or two :)

feat: GRPC instrumentation

848aafd

arielvalentin reviewed Jul 1, 2021

View reviewed changes

instrumentation/grpc/lib/opentelemetry/instrumentation/grpc/patches/active_call.rb Show resolved Hide resolved

arielvalentin reviewed Jul 1, 2021

View reviewed changes

instrumentation/grpc/lib/opentelemetry/instrumentation/grpc/patches/active_call.rb Show resolved Hide resolved

ahayworth added 13 commits July 1, 2021 09:11

Move span.finish into an ensure block

a3a9fa8

Merge branch 'main' into ahayworth-add-grpc-instrumentation

a4f42b2

Exclude cookpad/grpc_kit from grpc instrumentation

72c2b8f

Merge branch 'main' into ahayworth-add-grpc-instrumentation

b50a122

Add base & semantic_conventions gems to Gemfile

6dc5861

Add integration tests

796f04d

Merge branch 'main' into ahayworth-add-grpc-instrumentation

789539d

fixup gem version for grpc, I think

685b3b4

Do not test on jruby

6f13958

Make bidi streaming test less flaky, perhaps

4c8534e

Merge branch 'main' into ahayworth-add-grpc-instrumentation

98d3622

Merge branch 'main' into ahayworth-add-grpc-instrumentation

795547a

Merge branch 'main' into ahayworth-add-grpc-instrumentation

5e6264d

ahayworth marked this pull request as ready for review August 4, 2021 16:52

ahayworth requested review from dazuma, ericmustin, mwear and robertlaurin as code owners August 4, 2021 16:52

arielvalentin reviewed Aug 5, 2021

View reviewed changes

ericmustin reviewed Aug 5, 2021

View reviewed changes

Remove unnecessary inclusion of pry in Gemfile test group

8aaea74

ahayworth marked this pull request as draft August 6, 2021 14:17

ahayworth closed this Feb 1, 2022

danielmbarlow mentioned this pull request Feb 22, 2022

ActiveSupport notifications for Ruby on Rails grpc/grpc#21164

Closed

ahayworth mentioned this pull request Jun 5, 2022

Rewrite emailservice in ruby open-telemetry/opentelemetry-demo#109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GRPC instrumentation #854

feat: GRPC instrumentation #854

ahayworth commented Jun 30, 2021 •

edited

Loading

ahayworth commented Aug 4, 2021

arielvalentin Aug 5, 2021

ahayworth Aug 6, 2021

arielvalentin Aug 5, 2021

ahayworth Aug 6, 2021

arielvalentin Aug 5, 2021

ahayworth Aug 6, 2021

arielvalentin Aug 6, 2021

ericmustin Aug 5, 2021 •

edited

Loading

ahayworth Aug 6, 2021

ericmustin Aug 5, 2021

ahayworth Aug 6, 2021

ahayworth commented Aug 6, 2021

jimeh commented Nov 2, 2021

ahayworth commented Nov 3, 2021

jimeh commented Nov 9, 2021

ahayworth commented Nov 9, 2021

robertlaurin commented Jan 19, 2022

ahayworth commented Feb 1, 2022

jimeh commented Feb 15, 2022

feat: GRPC instrumentation #854

feat: GRPC instrumentation #854

Conversation

ahayworth commented Jun 30, 2021 • edited Loading

ahayworth commented Aug 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericmustin Aug 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahayworth commented Aug 6, 2021

jimeh commented Nov 2, 2021

ahayworth commented Nov 3, 2021

jimeh commented Nov 9, 2021

ahayworth commented Nov 9, 2021

robertlaurin commented Jan 19, 2022

ahayworth commented Feb 1, 2022

jimeh commented Feb 15, 2022

ahayworth commented Jun 30, 2021 •

edited

Loading

ericmustin Aug 5, 2021 •

edited

Loading