Skip to content

Several minor changes and OpenTelemetry support #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 21, 2022
Merged

Conversation

cretz
Copy link
Member

@cretz cretz commented Jul 15, 2022

What was changed

  • Added opentelemetry as an "extra" (i.e. optional set of dependencies)
  • Added temporalio.contrib.opentelemetry package with an interceptor that supports OpenTelemetry
  • Updated SDK core
  • Moved worker_binary_id from being a client option to build_id being a worker option
  • Support identity worker option to override one on the client if user wants
  • Removed header param for client calls (for interceptor use only currently)
  • Standardized putting headers on interceptor inputs
  • Fix issue where client.describe wasn't properly called
  • Fix issue where activities returning None or with no return weren't setting result payloads
  • Added new concept called "extern functions" for potential-upcoming workflow sandbox that allows you to call into the host. They are only accessible in interceptors and come with a big scary warning in docs.
  • Changed original interceptor idea of only allowing child/external handle altering and expecting people to override specific things to now intercepting specific things like child and external workflow signalling
  • Fix issue where eviction attempt of not-in-cache workflows returned completion error to core
  • Added workflow.time() and workflow.time_ns() calls to mimic the Python time library equivalents. Could not just use datetime result from existing workflow.now() in Otel impl because I wanted nanosecond precision.
  • Fix issue where <= 3.10 didn't work with type hinting because we were using eval_str in inspect.signature instead of typing.get_type_hints which has more support but is lower level

Why?

Just about everything here was borne out of OpenTelemetry needs and things discovered during OpenTelemetry implementation. The implementation does a naive open-then-close span approach in non-replay code paths which causes problems in certain situations (see PR comments).

Checklist

  1. Follows from Interceptor header and OpenTelemetry support #63 which we closed after realizing we couldn't do spans that way
  2. Closes Interceptor header and OpenTelemetry support #60

@cretz cretz requested a review from a team July 15, 2022 21:47
_default_build_id: Optional[str] = None


def load_default_build_id(*, memoize: bool = True) -> str:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was basically copied verbatim from workflow_service.py. Heavy scrutiny not needed (but welcome).

# Conflicts:
#	temporalio/client.py
#	temporalio/worker/workflow_instance.py
#	temporalio/workflow.py
#	temporalio/workflow_service.py
@@ -0,0 +1,17 @@
from .message_pb2 import (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in this temporalio/api directory can be ignored. It is auto-generated source.

@cretz cretz marked this pull request as draft July 18, 2022 21:11
@cretz
Copy link
Member Author

cretz commented Jul 18, 2022

After discussion, there are just too many problems with creating spans on replay. So here's the new rules and I think they are the best way to have a deterministic workflow tree and still handle everything cleanly. They are numbered for easy discussion.

Rules

  1. On client outbound start workflow, a span is created and serialized into the header
    • This span becomes the parent of every non-query workflow span
    • Absence of this span on the workflow side means no tracing for the non-query parts workflow (even if the interceptor is present on the worker side). This is to avoid a bunch of unparented orphan spans for doing things like executing activities.
  2. On workflow inbound execute workflow non-replay, if the header span is present, a span is created/ended that has the main workflow client-header span as its parent. Also a span is created/ended for completion of the workflow.
    • This, and other spans created inside the workflow are themselves not parents of anything explicitly (but some others are serialized to outbound headers and can become parents that way)
  3. On workflow inbound handle signal non-replay, a span is created/ended parented to the same parent as all the rest of the workflow spans
    • If there is a header from the client, that span is linked from the created/ended span
  4. On workflow inbound handle query, if a client header is present, a span is created/ended as a child of it
    • If there is no header, there is no query span at all (important for opting out of tracing queries, many people query a lot in a loop)
    • Whether a span is created or not is unrelated to whether one exists for the workflow as a whole, but if one does exist for the workflow as a whole, it will be linked to the one created if one is created
  5. On outbound non-replay start activity, start local activity, start child, signal child, and signal external, a span is created/ended parented to the same overarching client-header workflow span
    • This span is part of the header sent on these calls
    • Like other non-query workflow spans, doesn't apply if no overarching span
  6. On continue as new, the same workflow input header is given for continue as new. No new span created/ended.
  7. Activities with OpenTelemetry enabled always create spans for the life of their runs, parented by the header if present

By following these rules, users should get all the benefits of having tracing without the annoyance of spans being created unnecessarily during replay. Sending to team for feedback.

@cretz cretz marked this pull request as ready for review July 19, 2022 22:34
@cretz
Copy link
Member Author

cretz commented Jul 19, 2022

Marked ready for review. I have made updates to apply to the rules above. I will break this out into a bit clearer spec and we will make sure to document for newer SDKs.

Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't go into much depth.
The spec and tests look good, so I'm approving.

@cretz cretz merged commit a569582 into temporalio:main Jul 21, 2022
@cretz cretz deleted the otel branch July 21, 2022 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interceptor header and OpenTelemetry support
3 participants