Description
Hello.
I'm building a gem for tracing Ruby: https://github.com/socketry/trace. This gem is vendor agnostic and the goal is to provide vendor agnostic interfaces sufficient for distributed tracing including across fibers and threads. It's not designed to be a full interface but rather to provide the shims so that libraries can implement the tracing interface rather than be monkey patched. I know there are options like OpenTelemetry but they are too heavy for gems which in many cases won't be used in environments where tracing is occurring.
We have been using Datadog::Context
internally to try and connect these traces. We have an open PR for experimentation: socketry/traces#1
Essentially:
parent_context = self.current_context # returns an instance of Trace::Context which contains trace_id and span_id at least.
Fiber.new do
trace(..., parent_context, ...) do
# .. do work
end
end
The goal is to propagate the state from the parent to the child fiber.
When we first did this, we tried using Datadog::Context
and recreating it using the trace_id
and span_id
but this produced some odd diagrams:
Notice the blank rows.
I assumed it's because we are not providing enough details, so I tried experimenting with capturing the actual span
and reusing it as child_of: span
and this produced better pictures:
To be honest, it's not clear to me the right way to do this. Clearly recreating the context is not sufficient (even with the span_id
). Maybe this is a bug? Or maybe there is a better way to do this.
Finally, and this might just be a problem on my end, within those fibers, we have async.http.client.call
traces, but those are not connected. They end up getting logged as separate traces and I'm not sure why:
When fibers are not used, the HTTP client request does show up correctly:
Any help making this work correctly would be most appreciated. I'll continue to work on it on my end too.