Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper span handling to factories through extension trait #300

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

slawlor
Copy link
Owner

@slawlor slawlor commented Dec 17, 2024

Context

With #266, automation span propagation was added to ractor. This is very helpful in tracing context through actors, such that contextual information can be persisted through a message's lifecycle.

However, in factories, this is more subtle due to (a) internal queueing and (b) the worker's requirement to interface with the Factory directly (for ping/pong heartbeats).

The problem

Consider a message sent to a factory, which is a Job. When the job is picked up, the factory's message handler handle() will be instrumented with the span included in the message, silently by the framework for you. However should the factory choose to queue the message, then once that iteration of handle() is completed, the span context is lost. Then the factory may utilize a future message event to dequeue that previously enqueued message, and will instrument it with the context from the triggering event.

In production, it has been noticed that not only can this attach the wrong span, to the wrong message, but it also can cause recursive spans since a worker can send a message to a factory (pong to a ping), which can trigger a queue'd item to be processed, therefore attaching Factory.handle() 'ping' -> Worker.handle() 'pong' -> Factory.handle() 'pong' -> Worker.handle() 'message', and up to infinite depths.

This is not only wrong, but dangerous at runtime (incorrect context risk). Therefore this fix is done by attaching the span to the Job directly, inside the JobOptions which are generally constructed via Default::default() in all call sites. Then a new helper-trait is added, which automatically will instrument the span on the correct Worker.handle() call but also can alleviate a lot of the general worker boilerplate for you (see simplified syntax in examples and tests)

Important

Given the

  1. Removal of the Eq bound on the JobOptions
  2. Addition of a new field into JobOptions
  3. General change of the API contract for JobOptions to be more read-only, with only allowing the user to set the TTL

this change constitutes a major API break. Although most call sites will see a no-op change, we do need to do a major release to include this change.

This change

In this PR, I'm doing 2 major changes

  1. Adding the tracing::Span to the JobOptions in order to carry the span for the life of the job, until it's processed in the worker. Additionally we change the API for JobOptions to be primarily read-only, with exception only for setting the TTL. The rest is automatically handled for users.
  2. Adding a Worker trait, which auto-implements the Actor trait specifically for a Factory. This trait supports much of the boiler-plate of constructing a worker in a factory and instruments the span if it's provided on the workers handler.

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 61.40351% with 44 lines in your changes missing coverage. Please review.

Project coverage is 82.14%. Comparing base (6a08fe2) to head (49ddb4f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ractor/src/factory/job.rs 47.61% 22 Missing ⚠️
ractor/src/factory/worker.rs 63.33% 22 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #300      +/-   ##
==========================================
- Coverage   82.42%   82.14%   -0.28%     
==========================================
  Files          60       60              
  Lines       10924    10997      +73     
==========================================
+ Hits         9004     9034      +30     
- Misses       1920     1963      +43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@slawlor slawlor force-pushed the factory_spans branch 5 times, most recently from 7e63a7c to 264dbaf Compare December 17, 2024 21:41
@slawlor slawlor marked this pull request as ready for review December 17, 2024 22:02
@slawlor slawlor merged commit f15be0c into main Dec 17, 2024
12 of 14 checks passed
@slawlor slawlor deleted the factory_spans branch December 17, 2024 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant