Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not add AddLink(), use option to AddEvent() instead #3337

Closed
jmacd opened this issue Mar 23, 2023 · 12 comments
Closed

Do not add AddLink(), use option to AddEvent() instead #3337

jmacd opened this issue Mar 23, 2023 · 12 comments
Assignees
Labels
area:api Cross language API specification issue spec:trace Related to the specification/trace directory

Comments

@jmacd
Copy link
Contributor

jmacd commented Mar 23, 2023

Recently there has been a lot of debate over how to support adding links after spans start.

I have written up my case against adding AddLink() an in favor of extending the AddEvent() API.

The core of my argument is that the proposal to create AddLink() is based on a missing data model, and that when we fill in the data model we'll see that for every link created at start there is implicitly a link in the reverse direction created in the other span. One span's link is another span's link too, and when the other span receives a new link we ought to call it an event. I am trying to avoid a situation where the user has to add a link to both sides of a span reference.

Part of #1929.
Part of #454.
Part of #3240.

AddLink() is a mistake

Problem statement

AddLink() is a conceptual Span API meant to allow the addition of a Link to a Span after it starts. The AddLink() API was removed from OpenTelemetry Tracing early in its history because the interaction between user-level AddLink() and the SDK-level Sampler API was poorly understood.

Although OpenTelemetry has not written down its data model for Links, we all agree Span Links are special to OpenTelmetry because they form edges in a trace graph. It is important that Links appear as a separate kind of data in our Span data model because edges are a first-class part of tracing data.

Span Events are ill-defined in OpenTelemetry tracing. OpenTelemetry has not defined a data model for Tracing that explains what a Span Event is, instead it has defined an API that stands in as a logging API.

To Summarize this problem, we have not defined a data model for Span Events and we have not defined an API model for Span Links. To solve the API problem, we need to define a data model for Spans.

Proposed data model statements

The following definitions, added to a new data model document, will help us address this problem. These would be retroactive definitions, but there should be no conflicts with existing understanding.

Span Link: Data Model

A Span Link is a reference between two Spans that defines an edge in the trace graph. There are two Spans associated with every link, one called the Referer and one called the Referent. There is an implied or explicit event whenever a Span Link is formed.

In the existing specification, Spans Links are implicitly Referer links and they can only be created when the Span starts (i.e., the Start event creates the link). Note that when a Span Link is presented to the Tracer when the Span starts, that information can be used to make the sampling decision.

In the existing specification, Referent links are implicit features. We know there is a Referent Span for every Referer Span Link, but the only way to record a Referent Span Link is to sample the Referer Span. In other words, to capture the referent link you must sample the referer Span. This explains why we had to remove AddLink() in the early specification, because the Sampler has no mechanism to record referent spans links otherwise.

In proposed data model specification for tracing, Links will have new fields added to support AddLink() functionality. Necessary Sampler API changes are described below.

  message Link {
    // The existing fields

    bytes trace_id = 1;
    bytes span_id = 2;
    string trace_state = 3;
    repeated opentelemetry.proto.common.v1.KeyValue attributes = 4;
    uint32 dropped_attributes_count = 5;
	
	// New fields to represent links added after span start.  When these
	// are set to default values, the span link was created with the span.

	string link_event_name = 6;  // To add an event after span start, it must have a name.
	fixed64 time_unix_nano = 7;  // To add an event after span start, it must have a time.
	LinkKind link_kind = 8;      // New field to distinguish which kind of link event.
  }
  
  enum LinkKind {
	  LINK_KIND_REFERER = 0,  // The original span link was a referer link
	  LINK_KIND_REFERENT = 1, // Referent links are always added after span start
  }

Note that whatever APIs are used to create Referer Span Links (see below), the system can be configured to also create Referent Span Links.

For example, if the referent span is alive at the time a link is created, the SDK could be configured to record a referent Link; otherwise, this implicit Referent link can can be established downstream, at the consumer. When a span link is created to a span after the referent span has ended, only the downstream consumer is capable of adding in the Referent link.

Span Event: Data Model

The existing Span Event concept is not well defined in the data model. Here is a proposed definition for the Event that appears in the Span data model.

Span Events are a projection of selected Log events that refer to the Span.

There are no changes to the existing Event structure in this proposal, only a new data model definition.

Note that whereas Span Link data objects contain one explicit and one implicit span reference (i.e., the referer and referent), Span Events contain just one implicit span reference (i.e., the associated span).

Suppose a user has an existing logging API and is capable of including a span context in a log event. The Span Event constructed from the log event is said to be a "projection" because it may be one of several copies. The log event that mentions two span IDs may project an Event into both span data objects.

Note that a log event that mentions two span IDs is not the same as a span Link, because Links are first class parts of the data model. To add a span Link requires a using the Tracer or Span APIs, depending on whether the link is created at the same time as the span or not.

Span Link: API model

In this proposal, there are two APIs for creating links.

Links may be constructed at span start using Tracer.Start(WithLink(context, attributes...)). While this constructs explicit Referer links, it also creates implicit Referent links in the referred-to spans. The SDK can be configured to record Referent links in live spans, as described above, in which case a span Start with N associated links to live spans actually constructs 2*N Link objects, that is N referer links in one span and N referent links in the other spans.

Links may be constructed after span start by recording an event with a span link property using the AddEvent API with an optional link context Span.AddEvent("name", WithLinkContext(context), WithAttributes(attributes...)). When the AddEvent API receives a Link context it will create a Link object, otherwise it will create an Event object.

Span Event: API model

As described above, the span AddEvent API would be used to record either events or links, since after-Start links are identical to events with the addition of a span context.

The docuemntation for AddEvent would be updated to indicate that it is a mechanism for reporting log events associated with the span, including Link events. AddEvent would explain that while it is one way of creating Events and Links in the resulting span data, there are other mechanisms for creating Links and Events. For example, you can call AddEvent() on the Referer span to create a Link in the Referent span, or you can use an OpenTelemetry-recognized logging API to project an Event into the Span.

Sampler fixes

The original reason that AddLink() was removed from the Tracing API was that it interfered with Sampling decisions. While it is unavoidable that Links added after a span starts cannot influence a head-sampling decision, the problem with sampling and links is worse than that.

When a sampler has decided to sampler a span AND that span is the referent span of links created by other spans, there is no way to encode the Referent links in the span unless the other spans are also sampled.

While Referer links in the current specification can only be created when a span starts, Referent links can only be created after a span starts. Since we have no way to encode a link added after a span starts in the span data, it is not possible for the sampled span to encode complete information about itself. We need a way for the Span data to incorporate Referent links to support Sampling in the presence of links, due to cases where the Referer span is not sampled, so we need a data model for links added after a span starts.

@jmacd jmacd added the spec:trace Related to the specification/trace directory label Mar 23, 2023
@arminru arminru added the area:api Cross language API specification issue label Mar 28, 2023
@carlosalberto
Copy link
Contributor

Some of the discussion points (briefly) touched during today's SIG call (to be taken as initial feedback, given the involved complexity here):

  • In Logs, trace/span ids are first class members and get special handling, rather than being part of everything-else in a dictionary, which could happen if we go with the Event model.
  • There's overall fear of ending up with a world representation undeer which everything is an event, e.g. Span.end() or start().
  • There's fear about quickly overload the API (as a general concern).

@open-telemetry/specs-approvers please review this.

@jmacd
Copy link
Contributor Author

jmacd commented Mar 28, 2023

in a dictionary

This is mis-representation. There is no dictionary in my proposal -- in the section heading "Span Link: Data Model " the Link structure is extended with features of the event for use when an event includes a link.

@jack-berg
Copy link
Member

A Span Link is a reference between two Spans that defines an edge in the trace graph. T

How does a span link edge differ from an edge that implicitly exists between a child and parent span? In what ways are they the same?

otherwise, this implicit Referent link can can be established downstream, at the consumer. When a span link is created to a span after the referent span has ended, only the downstream consumer is capable of adding in the Referent link.

This seems like it suffers the same problem the current design suffers from: "to capture the referent link you must sample the referer Span".

Span.AddEvent("name", WithLinkContext(context), WithAttributes(attributes...))

Would this also create referent links on the referred-to spans?

Also, what is the purpose of the link name? What name would be given to links created at span start, which doesn't include the name argument?

@bogdandrutu
Copy link
Member

I do understand the comment that the data model is not defined which is used very extensively in this proposal, but I don't see the "clear" correlation between that and using the same API for two different things in the data model.

There are lots of unclear things in the link data model that need to be clarified:

  • What is the purpose of the "link event name"? Just to make it look like event?
  • Do you enforce backward propagation in case a link is between spans from different processes? If not you should clarify that one edge may miss, and what is the expected behavior.
  • The relationship names/meaning do not carry clear information about dependency between the spans (depending on, follows, etc.), is that intentional? If yes, are you planning to add a new concept that defines that?

Also as a last comment, your intention to better define links is very good and appreciated, but again I fail to understand how that makes link == events, having a separate API is only to signal that they are different concepts, the APIs can look similar. Is your motivation that changing golang interfaces is breaking change?

@jmacd
Copy link
Contributor Author

jmacd commented Apr 4, 2023

Bogdan's feedback in the Spec SIG today was:

Do we agree that the API should support some way to add a link, potentially this could allow SDKs to decide between AddEvent(WithLink(...)) formulation vs. the AddLink(...) formulation?

@weyert
Copy link

weyert commented Apr 4, 2023

If I understand this proposal correctly the use case of using this would be for example, create traces of CI pipeline executions?

@Oberon00
Copy link
Member

Oberon00 commented Apr 5, 2023

@weyert No, I think this is completely unrelated.

@Oberon00
Copy link
Member

Oberon00 commented Apr 5, 2023

@jmacd

Do we agree that the API should support some way to add a link, potentially this could allow SDKs to decide between AddEvent(WithLink(...)) formulation vs. the AddLink(...) formulation?

I think we should not leave this decision to the SDKs but decide in the spec.

There is one other option I want to bring up, I think it wasn't considered yet:

Do not introduce completely new APIs for (late) links but instead make Link (or SpanContext) an allowed attribute type.

This would allow adding links inside events (event attributes), spans (span attributes) and potentially elsewhere. Since we also have array attributes, you could also have an array of links.

@carlosalberto
Copy link
Contributor

Today this was briefly, generally discussed in the Messaging SIG today, as the OTEP 220 was merged and now we need to support links after Span creation. Let's schedule discussing this on the next Spec call? I'd like to get this resolved one way or another, as otherwise we will have to introduce temporary workarounds while incorporating the mentioned OTEP.

So questions of my very own @jmacd (not representing the group here)

  1. The proto additions seems ok and helpful to me, although I don't know about event name specifically (timestamp and link type are definitely useful). In general these three new fields could be a nice to have thing, albeit they are not super required by what the group has defined so far.

  2. Regarding the API (using AddEvent() with an optional Link), I feel there's the risk of confusing users, as they will never know what they may get (an event sometimes, other times a Link). I feel it would be clearer for users to add a Span.AddLink() operation taking the additional parameters instead (timestamp and link type with default values even). Going the Span.AddLink() route instead would also help us, if desired, to actually postpone the addition of the new proto fields, while unblocking the messaging SIG.

@pyohannes
Copy link
Contributor

Links may be constructed after span start by recording an event with a span link property using the AddEvent API with an optional link context Span.AddEvent("name", WithLinkContext(context), WithAttributes(attributes...)). When the AddEvent API receives a Link context it will create a Link object, otherwise it will create an Event object.

Although I prefer a solution with an AddLink method, which keeps the API and the data model consistent (links and events as separate entities in both), this would be acceptable to me. However, just hinting at some possible problems:

  • This looks decent in Go, but might not look as nice in other languages. E. g. C++, where AddEvent accepts an event name, and then a list of attributes.
  • It's not clear to me where the conversion to a Link object happens: before information crosses the API/SDK boundary, or after (so to speak, whether the conversion happens in the OTel API, or in the OTel SDK).
  • Given that the link context in the event will likely be modeled via attributes with reserved names, this introduces a dependency to some kind of semantic conventions into the SDK (or, depending on the answers to the question above, even the API). Not a blocker, but I think something that should be avoided when possible.
  • From an API design perspective, I feel this is going to confuse users. I at least would be a little surprised by events as API entities sometimes mapping to events and sometimes to links in the data model.

@MSNev
Copy link
Contributor

MSNev commented Aug 22, 2023

For reference any changes to the span "Event" should conform with the direction of the general "event" as currently being defined for Log.

Where an Event should contain

In terms of "where" the payload will be defined this is an open question around event.data as proposed in #2926 or whether this should use Body and a span event should be extended to contain a Body as per #3406

@jmacd
Copy link
Contributor Author

jmacd commented Sep 28, 2023

The group has reached a consensus that adding methods is OK. See #3678.

@jmacd jmacd closed this as completed Sep 28, 2023
carlosalberto added a commit that referenced this issue Oct 10, 2023
Fixes #454 

Related to #3337 

As the Messaging SIG merged its last OTEP 222, we will be adding operations
that require Links after Span creation, taking a direct route with `AddLink()`,
albeit without any of the new members suggested in #3337, e.g. `timestamp` (to be
discussed in a separate issue).

```
AddLink(spanContext, attributes /* optional */)
```
carlosalberto added a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 31, 2024
Fixes open-telemetry#454 

Related to open-telemetry#3337 

As the Messaging SIG merged its last OTEP 222, we will be adding operations
that require Links after Span creation, taking a direct route with `AddLink()`,
albeit without any of the new members suggested in open-telemetry#3337, e.g. `timestamp` (to be
discussed in a separate issue).

```
AddLink(spanContext, attributes /* optional */)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

10 participants