Add proper span handling to factories through extension trait #300
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
With #266, automation span propagation was added to
ractor
. This is very helpful in tracing context through actors, such that contextual information can be persisted through a message's lifecycle.However, in factories, this is more subtle due to (a) internal queueing and (b) the worker's requirement to interface with the Factory directly (for ping/pong heartbeats).
The problem
Consider a message sent to a factory, which is a
Job
. When the job is picked up, the factory's message handlerhandle()
will be instrumented with the span included in the message, silently by the framework for you. However should the factory choose to queue the message, then once that iteration ofhandle()
is completed, the span context is lost. Then the factory may utilize a future message event to dequeue that previously enqueued message, and will instrument it with the context from the triggering event.In production, it has been noticed that not only can this attach the wrong span, to the wrong message, but it also can cause recursive spans since a worker can send a message to a factory (pong to a ping), which can trigger a queue'd item to be processed, therefore attaching Factory.handle() 'ping' -> Worker.handle() 'pong' -> Factory.handle() 'pong' -> Worker.handle() 'message', and up to infinite depths.
This is not only wrong, but dangerous at runtime (incorrect context risk). Therefore this fix is done by attaching the span to the
Job
directly, inside theJobOptions
which are generally constructed viaDefault::default()
in all call sites. Then a new helper-trait is added, which automatically will instrument the span on the correct Worker.handle() call but also can alleviate a lot of the general worker boilerplate for you (see simplified syntax in examples and tests)Important
Given the
Eq
bound on theJobOptions
JobOptions
JobOptions
to be more read-only, with only allowing the user to set the TTLthis change constitutes a major API break. Although most call sites will see a no-op change, we do need to do a major release to include this change.
This change
In this PR, I'm doing 2 major changes
tracing::Span
to theJobOptions
in order to carry the span for the life of the job, until it's processed in the worker. Additionally we change the API forJobOptions
to be primarily read-only, with exception only for setting the TTL. The rest is automatically handled for users.Worker
trait, which auto-implements theActor
trait specifically for a Factory. This trait supports much of the boiler-plate of constructing a worker in a factory and instruments the span if it's provided on the workers handler.