Do not add AddLink(), use option to AddEvent() instead #3337

jmacd · 2023-03-23T15:03:01Z

Recently there has been a lot of debate over how to support adding links after spans start.

I have written up my case against adding AddLink() an in favor of extending the AddEvent() API.

The core of my argument is that the proposal to create AddLink() is based on a missing data model, and that when we fill in the data model we'll see that for every link created at start there is implicitly a link in the reverse direction created in the other span. One span's link is another span's link too, and when the other span receives a new link we ought to call it an event. I am trying to avoid a situation where the user has to add a link to both sides of a span reference.

Part of #1929.
Part of #454.
Part of #3240.

AddLink() is a mistake

Problem statement

AddLink() is a conceptual Span API meant to allow the addition of a Link to a Span after it starts. The AddLink() API was removed from OpenTelemetry Tracing early in its history because the interaction between user-level AddLink() and the SDK-level Sampler API was poorly understood.

Although OpenTelemetry has not written down its data model for Links, we all agree Span Links are special to OpenTelmetry because they form edges in a trace graph. It is important that Links appear as a separate kind of data in our Span data model because edges are a first-class part of tracing data.

Span Events are ill-defined in OpenTelemetry tracing. OpenTelemetry has not defined a data model for Tracing that explains what a Span Event is, instead it has defined an API that stands in as a logging API.

To Summarize this problem, we have not defined a data model for Span Events and we have not defined an API model for Span Links. To solve the API problem, we need to define a data model for Spans.

Proposed data model statements

The following definitions, added to a new data model document, will help us address this problem. These would be retroactive definitions, but there should be no conflicts with existing understanding.

Span Link: Data Model

A Span Link is a reference between two Spans that defines an edge in the trace graph. There are two Spans associated with every link, one called the Referer and one called the Referent. There is an implied or explicit event whenever a Span Link is formed.

In the existing specification, Spans Links are implicitly Referer links and they can only be created when the Span starts (i.e., the Start event creates the link). Note that when a Span Link is presented to the Tracer when the Span starts, that information can be used to make the sampling decision.

In the existing specification, Referent links are implicit features. We know there is a Referent Span for every Referer Span Link, but the only way to record a Referent Span Link is to sample the Referer Span. In other words, to capture the referent link you must sample the referer Span. This explains why we had to remove AddLink() in the early specification, because the Sampler has no mechanism to record referent spans links otherwise.

In proposed data model specification for tracing, Links will have new fields added to support AddLink() functionality. Necessary Sampler API changes are described below.

  message Link {
    // The existing fields

    bytes trace_id = 1;
    bytes span_id = 2;
    string trace_state = 3;
    repeated opentelemetry.proto.common.v1.KeyValue attributes = 4;
    uint32 dropped_attributes_count = 5;
	
	// New fields to represent links added after span start.  When these
	// are set to default values, the span link was created with the span.

	string link_event_name = 6;  // To add an event after span start, it must have a name.
	fixed64 time_unix_nano = 7;  // To add an event after span start, it must have a time.
	LinkKind link_kind = 8;      // New field to distinguish which kind of link event.
  }
  
  enum LinkKind {
	  LINK_KIND_REFERER = 0,  // The original span link was a referer link
	  LINK_KIND_REFERENT = 1, // Referent links are always added after span start
  }

Note that whatever APIs are used to create Referer Span Links (see below), the system can be configured to also create Referent Span Links.

For example, if the referent span is alive at the time a link is created, the SDK could be configured to record a referent Link; otherwise, this implicit Referent link can can be established downstream, at the consumer. When a span link is created to a span after the referent span has ended, only the downstream consumer is capable of adding in the Referent link.

Span Event: Data Model

The existing Span Event concept is not well defined in the data model. Here is a proposed definition for the Event that appears in the Span data model.

Span Events are a projection of selected Log events that refer to the Span.

There are no changes to the existing Event structure in this proposal, only a new data model definition.

Note that whereas Span Link data objects contain one explicit and one implicit span reference (i.e., the referer and referent), Span Events contain just one implicit span reference (i.e., the associated span).

Suppose a user has an existing logging API and is capable of including a span context in a log event. The Span Event constructed from the log event is said to be a "projection" because it may be one of several copies. The log event that mentions two span IDs may project an Event into both span data objects.

Note that a log event that mentions two span IDs is not the same as a span Link, because Links are first class parts of the data model. To add a span Link requires a using the Tracer or Span APIs, depending on whether the link is created at the same time as the span or not.

Span Link: API model

In this proposal, there are two APIs for creating links.

Links may be constructed at span start using Tracer.Start(WithLink(context, attributes...)). While this constructs explicit Referer links, it also creates implicit Referent links in the referred-to spans. The SDK can be configured to record Referent links in live spans, as described above, in which case a span Start with N associated links to live spans actually constructs 2*N Link objects, that is N referer links in one span and N referent links in the other spans.

Links may be constructed after span start by recording an event with a span link property using the AddEvent API with an optional link context Span.AddEvent("name", WithLinkContext(context), WithAttributes(attributes...)). When the AddEvent API receives a Link context it will create a Link object, otherwise it will create an Event object.

Span Event: API model

As described above, the span AddEvent API would be used to record either events or links, since after-Start links are identical to events with the addition of a span context.

The docuemntation for AddEvent would be updated to indicate that it is a mechanism for reporting log events associated with the span, including Link events. AddEvent would explain that while it is one way of creating Events and Links in the resulting span data, there are other mechanisms for creating Links and Events. For example, you can call AddEvent() on the Referer span to create a Link in the Referent span, or you can use an OpenTelemetry-recognized logging API to project an Event into the Span.

Sampler fixes

The original reason that AddLink() was removed from the Tracing API was that it interfered with Sampling decisions. While it is unavoidable that Links added after a span starts cannot influence a head-sampling decision, the problem with sampling and links is worse than that.

When a sampler has decided to sampler a span AND that span is the referent span of links created by other spans, there is no way to encode the Referent links in the span unless the other spans are also sampled.

While Referer links in the current specification can only be created when a span starts, Referent links can only be created after a span starts. Since we have no way to encode a link added after a span starts in the span data, it is not possible for the sampled span to encode complete information about itself. We need a way for the Span data to incorporate Referent links to support Sampling in the presence of links, due to cases where the Referer span is not sampled, so we need a data model for links added after a span starts.

The text was updated successfully, but these errors were encountered:

carlosalberto · 2023-03-28T18:30:46Z

Some of the discussion points (briefly) touched during today's SIG call (to be taken as initial feedback, given the involved complexity here):

In Logs, trace/span ids are first class members and get special handling, rather than being part of everything-else in a dictionary, which could happen if we go with the Event model.
There's overall fear of ending up with a world representation undeer which everything is an event, e.g. Span.end() or start().
There's fear about quickly overload the API (as a general concern).

@open-telemetry/specs-approvers please review this.

jmacd · 2023-03-28T19:21:37Z

in a dictionary

This is mis-representation. There is no dictionary in my proposal -- in the section heading "Span Link: Data Model " the Link structure is extended with features of the event for use when an event includes a link.

jack-berg · 2023-04-04T14:28:32Z

A Span Link is a reference between two Spans that defines an edge in the trace graph. T

How does a span link edge differ from an edge that implicitly exists between a child and parent span? In what ways are they the same?

otherwise, this implicit Referent link can can be established downstream, at the consumer. When a span link is created to a span after the referent span has ended, only the downstream consumer is capable of adding in the Referent link.

This seems like it suffers the same problem the current design suffers from: "to capture the referent link you must sample the referer Span".

Span.AddEvent("name", WithLinkContext(context), WithAttributes(attributes...))

Would this also create referent links on the referred-to spans?

Also, what is the purpose of the link name? What name would be given to links created at span start, which doesn't include the name argument?

bogdandrutu · 2023-04-04T15:21:30Z

I do understand the comment that the data model is not defined which is used very extensively in this proposal, but I don't see the "clear" correlation between that and using the same API for two different things in the data model.

There are lots of unclear things in the link data model that need to be clarified:

What is the purpose of the "link event name"? Just to make it look like event?
Do you enforce backward propagation in case a link is between spans from different processes? If not you should clarify that one edge may miss, and what is the expected behavior.
The relationship names/meaning do not carry clear information about dependency between the spans (depending on, follows, etc.), is that intentional? If yes, are you planning to add a new concept that defines that?

Also as a last comment, your intention to better define links is very good and appreciated, but again I fail to understand how that makes link == events, having a separate API is only to signal that they are different concepts, the APIs can look similar. Is your motivation that changing golang interfaces is breaking change?

jmacd · 2023-04-04T15:59:36Z

Bogdan's feedback in the Spec SIG today was:

Do we agree that the API should support some way to add a link, potentially this could allow SDKs to decide between AddEvent(WithLink(...)) formulation vs. the AddLink(...) formulation?

weyert · 2023-04-04T23:39:17Z

If I understand this proposal correctly the use case of using this would be for example, create traces of CI pipeline executions?

Oberon00 · 2023-04-05T11:09:53Z

@weyert No, I think this is completely unrelated.

Oberon00 · 2023-04-05T11:14:06Z

@jmacd

Do we agree that the API should support some way to add a link, potentially this could allow SDKs to decide between AddEvent(WithLink(...)) formulation vs. the AddLink(...) formulation?

I think we should not leave this decision to the SDKs but decide in the spec.

There is one other option I want to bring up, I think it wasn't considered yet:

Do not introduce completely new APIs for (late) links but instead make Link (or SpanContext) an allowed attribute type.

This would allow adding links inside events (event attributes), spans (span attributes) and potentially elsewhere. Since we also have array attributes, you could also have an array of links.

carlosalberto · 2023-08-17T18:27:47Z

Today this was briefly, generally discussed in the Messaging SIG today, as the OTEP 220 was merged and now we need to support links after Span creation. Let's schedule discussing this on the next Spec call? I'd like to get this resolved one way or another, as otherwise we will have to introduce temporary workarounds while incorporating the mentioned OTEP.

So questions of my very own @jmacd (not representing the group here)

The proto additions seems ok and helpful to me, although I don't know about event name specifically (timestamp and link type are definitely useful). In general these three new fields could be a nice to have thing, albeit they are not super required by what the group has defined so far.
Regarding the API (using AddEvent() with an optional Link), I feel there's the risk of confusing users, as they will never know what they may get (an event sometimes, other times a Link). I feel it would be clearer for users to add a Span.AddLink() operation taking the additional parameters instead (timestamp and link type with default values even). Going the Span.AddLink() route instead would also help us, if desired, to actually postpone the addition of the new proto fields, while unblocking the messaging SIG.

pyohannes · 2023-08-21T12:48:04Z

Links may be constructed after span start by recording an event with a span link property using the AddEvent API with an optional link context Span.AddEvent("name", WithLinkContext(context), WithAttributes(attributes...)). When the AddEvent API receives a Link context it will create a Link object, otherwise it will create an Event object.

Although I prefer a solution with an AddLink method, which keeps the API and the data model consistent (links and events as separate entities in both), this would be acceptable to me. However, just hinting at some possible problems:

This looks decent in Go, but might not look as nice in other languages. E. g. C++, where AddEvent accepts an event name, and then a list of attributes.
It's not clear to me where the conversion to a Link object happens: before information crosses the API/SDK boundary, or after (so to speak, whether the conversion happens in the OTel API, or in the OTel SDK).
Given that the link context in the event will likely be modeled via attributes with reserved names, this introduces a dependency to some kind of semantic conventions into the SDK (or, depending on the answers to the question above, even the API). Not a blocker, but I think something that should be avoided when possible.
From an API design perspective, I feel this is going to confuse users. I at least would be a little surprised by events as API entities sometimes mapping to events and sometimes to links in the data model.

MSNev · 2023-08-22T15:33:57Z

For reference any changes to the span "Event" should conform with the direction of the general "event" as currently being defined for Log.

Where an Event should contain

a unique name currently defined as domain + name combination, but may be collapsed to a single event.name (Decide if event.domain separately from event.name is necessary #2994)
It may contain a defined payload (based on the domain/name) where the semantics of the unique name defines the expected payload content and when the name is "unknown" it should be treated as a custom event
- Define the shape of a event by encapsulating the payload in event.data. #2926
- Semantic conventions for browser events semantic-conventions#30

In terms of "where" the payload will be defined this is an open question around event.data as proposed in #2926 or whether this should use Body and a span event should be extended to contain a Body as per #3406

jmacd · 2023-09-28T16:19:56Z

The group has reached a consensus that adding methods is OK. See #3678.

Fixes #454 Related to #3337 As the Messaging SIG merged its last OTEP 222, we will be adding operations that require Links after Span creation, taking a direct route with `AddLink()`, albeit without any of the new members suggested in #3337, e.g. `timestamp` (to be discussed in a separate issue). ``` AddLink(spanContext, attributes /* optional */) ```

Fixes open-telemetry#454 Related to open-telemetry#3337 As the Messaging SIG merged its last OTEP 222, we will be adding operations that require Links after Span creation, taking a direct route with `AddLink()`, albeit without any of the new members suggested in open-telemetry#3337, e.g. `timestamp` (to be discussed in a separate issue). ``` AddLink(spanContext, attributes /* optional */) ```

jmacd added the spec:trace Related to the specification/trace directory label Mar 23, 2023

github-actions bot assigned reyang Mar 23, 2023

arminru added the area:api Cross language API specification issue label Mar 28, 2023

Oberon00 mentioned this issue Jun 15, 2023

Message handler open-telemetry/opentelemetry-java-instrumentation#8331

Draft

carlosalberto mentioned this issue Aug 29, 2023

Add a new AddLink() operation to Span. #3678

Merged

jmacd closed this as completed Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not add AddLink(), use option to AddEvent() instead #3337

Do not add AddLink(), use option to AddEvent() instead #3337

jmacd commented Mar 23, 2023

carlosalberto commented Mar 28, 2023

jmacd commented Mar 28, 2023

jack-berg commented Apr 4, 2023

bogdandrutu commented Apr 4, 2023

jmacd commented Apr 4, 2023

weyert commented Apr 4, 2023

Oberon00 commented Apr 5, 2023

Oberon00 commented Apr 5, 2023 •

edited

Loading

carlosalberto commented Aug 17, 2023

pyohannes commented Aug 21, 2023

MSNev commented Aug 22, 2023

jmacd commented Sep 28, 2023

Do not add AddLink(), use option to AddEvent() instead #3337

Do not add AddLink(), use option to AddEvent() instead #3337

Comments

jmacd commented Mar 23, 2023

AddLink() is a mistake

Problem statement

Proposed data model statements

Span Link: Data Model

Span Event: Data Model

Span Link: API model

Span Event: API model

Sampler fixes

carlosalberto commented Mar 28, 2023

jmacd commented Mar 28, 2023

jack-berg commented Apr 4, 2023

bogdandrutu commented Apr 4, 2023

jmacd commented Apr 4, 2023

weyert commented Apr 4, 2023

Oberon00 commented Apr 5, 2023

Oberon00 commented Apr 5, 2023 • edited Loading

carlosalberto commented Aug 17, 2023

pyohannes commented Aug 21, 2023

MSNev commented Aug 22, 2023

jmacd commented Sep 28, 2023

Oberon00 commented Apr 5, 2023 •

edited

Loading