Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RecordLink for Span. #3240

Closed
wants to merge 1 commit into from

Conversation

carlosalberto
Copy link
Contributor

@carlosalberto carlosalberto commented Feb 21, 2023

Alternative to #3186

A new optional RecordLink(SpanContext, Attributes) operation is added, very similar to RecordException, which means we use AddEvent behind the scenes, storing the actual data as Attributes:

  • Event name is link.
  • trace and span ids are stored as hex strings.
  • Trace State remains a string.
  • The Attributes associated to a new Link are passed through to the newly created Event.

This effectively means these are soft Links. Other details:

  • As this is an optional operation, not all languages would have to add this (C++, Go, etc). However, this will require instrumentation authors to manually set the Attributes passed to AddEvent as shown above.
  • (At least) OTLP consumers will have to process standard Links plus check any event with the link name and containing the traceid, spanid and tracestate triplet.

@carlosalberto carlosalberto requested review from a team February 21, 2023 16:44
@carlosalberto carlosalberto changed the title Add a RecordLink for Span. Add RecordLink for Span. Feb 21, 2023
@arminru arminru added area:api Cross language API specification issue area:sdk Related to the SDK spec:trace Related to the specification/trace directory labels Feb 21, 2023
@@ -0,0 +1,45 @@
# Semantic Conventions for Links
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this new file to the TOC in README.md

@@ -691,6 +692,26 @@ Note: `RecordException` may be seen as a variant of `AddEvent` with
additional exception-specific parameters and all other parameters being optional
(because they have defaults from the exception semantic convention).

#### Record Link
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably too early to spark a naming discussion but I think span.RecordLink(ctx, attrs) would be very misleading and should rather be called RecordLinkAs(Span)Event or the like, since the outcome will not be a Span Link as one might expect from the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we go down the path of recording links as events after span starts, we'd eventually replace original links with events too. If that happens, we'd regret naming it as RecordLinkAs(Span)Event.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly against this approach. A Link is a dedicated entity in the data model that is designed to capture causality between spans. Reusing another entity from the data model (Event) for the same purpose is a very bad design.

@tsloughter
Copy link
Member

I agree with @yurishkuro.

There seem to be 2 use cases, one that is due to limitations of particular instrumentations (so don't conceptionally need the link added after start) and those like #454 that describe non-causal relationships.

In the latter example I'd argue those shouldn't be Links in the first place. In which case maybe a definition for adding span contexts as Events to a Span makes sense, just not "Links".

So Links can stay as they are, and Events are used for non-causal relationships -- "at time T this span did something based on a message created during Span X".

@yurishkuro
Copy link
Member

"at time T this span did something based on a message created during Span X".

why isn't this causal / happens-before?

@tsloughter
Copy link
Member

@yurishkuro just basing it on the example given in #454 and generally considering the idea that a span could cause an Event instead of a Span. I can't think of an example that describes the real world data/events that would need to be modeled like this.

@lmolkova
Copy link
Contributor

lmolkova commented Feb 21, 2023

I have no strong opinion on the link/event terminology. FWIW limiting links to causal-only relationships leaves us without a way to represent non-causal relationships and perhaps we should be looking into broadening links definition.

Anyway, representing the thing that describes that two spans are related as event seems optimal:

  • events (at least in scope of Event API) can happen outside of span lifetime, i.e. if I discover that two spans are related after they're both over, I don't need to create an artificial span just to record relationships between them. (this was a real case in some Azure service)
  • they have timestamp, i.e. if I receive multiple messages in scope of one operation, I can record when each of them was received
  • they don't affect sampling decisions
  • they are flexible and don't imply specific relationships between spans.

[retrieved](../api.md#retrieving-the-traceid-and-spanid)
by `SpanContext`.
examples: 'af9d5aa4-a685-4c5f-a22b-444f80b3cc28'
- id: tracestate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace-flags? otherwise we don't know if it was recorded on the other service

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds useful, although I think you could make the same argument for our normal parent-child relationships (maybe we should actually add a parent_flags to the OTLP span?)

@MrAlias
Copy link
Contributor

MrAlias commented Feb 23, 2023

I am strongly against this approach. A Link is a dedicated entity in the data model that is designed to capture causality between spans. Reusing another entity from the data model (Event) for the same purpose is a very bad design.

Can you link to the data model definition?

@pyohannes
Copy link
Contributor

Can you link to the data model definition?

I guess one could see OTLP as a data model definition? There links are a different entity than events.

One of the main drawbacks of this approach is, that it forces telemetry consumers (who want to properly support links) to look at two different places for them: as links and as events. Proper support for links created after span creation would then require modifications in a variety of backends and exporters.

@yurishkuro
Copy link
Member

yurishkuro commented Feb 23, 2023

Can you link to the data model definition?

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#specifying-links
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans

These two define the semantics and the data model of the links. The definition is not as strong as Lamport's happens-before relationship, it only says "causally related", but the point is that we have a distinct model entity defined for that, so as @pyohannes said above, using a different entity like Event, which is defined semantically to mean a very different thing, is a poor design choice, and a poor user experience (both production and consumption).

@lmolkova
Copy link
Contributor

lmolkova commented Feb 24, 2023

One of the main drawbacks of this approach is, that it forces telemetry consumers (who want to properly support links) to look at two different places for them: as links and as events. Proper support for links created after span creation would then require modifications in a variety of backends and exporters.

we have a precedent unifying events and logs. If we do links-are-represented-as-events in OTLP, the original links would probably go away.

using a different entity like Event, which is defined semantically to mean a very different thing, is a poor design choice, and a poor user experience (both production and consumption).

some people say that everything (including span) is semantically an event. In any case, user experience and over-the-wire data representation are not strictly related and I'd love to see some explanation on what's bad about user experience.

@yurishkuro
Copy link
Member

yurishkuro commented Feb 24, 2023

I'd love to see some explanation on what's bad about user experience.

Cognitive overhead, ambiguity. Production: do I record this as link or event? Consumption: do I read this from links or events? Production: I am capturing a relationship between two spans, what is the timestamp doing here?

some people say that everything (including span) is semantically an event

we can also say that everything is a sequence of bytes, why bother with semantics of the logical data model at all?

@lmolkova
Copy link
Contributor

lmolkova commented Feb 24, 2023

Production: do I record this as link or event?

You record it as link since that's what on the API and in the documentation without bothering much about over-the-wire format.

Production: I am capturing a relationship between two spans, what is the timestamp doing here?

Counter: When I'm receiving multiple messages with a timeout and some of them are prefetched and other come at different points in time, how do I record when the message came without creating two things - link AND event?

Timestamp is there to say when the relationship was discovered and should also help answer questions like: why this link was not considered when I made sampling decision.

Consumption: do I read this from links or events?

Users don't read it from over-the-wire format but from their backends. Cognitive load of a backend developer getting list of links from the span and transforming them is no different than extracting exceptions from events.

Counter: as a user, when I want to see if two spans are related and my backend stores links inside spans, I need to write a query like span.links.filter(l.trace_id == another_trace_id), how is this a good experience?

Having events that exist outside of span lifetime would at least require them to be stored separately improving the experience instead.

@lmolkova
Copy link
Contributor

lmolkova commented Feb 24, 2023

we can also say that everything is a sequence of bytes, why bother with semantics of the logical data model at all?

what I'm saying is that event is a base type for different things and I argue that links is one of these things

@yurishkuro
Copy link
Member

@lmolkova you argue that relationships can be recorded via events. I don't dispute that. A link is just a structured representation of the relationship that can be just as well encoded via semantic conventions. What I don't agree with is having two representations of links in the data model, as this PR is proposing. I can get behind the proposal to deprecate the concept of Links in favor of span events with specific semantic convention for capturing span/trace reference.

@github-actions
Copy link

github-actions bot commented Mar 4, 2023

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 4, 2023
@MrAlias MrAlias removed the Stale label Mar 7, 2023
@MrAlias
Copy link
Contributor

MrAlias commented Mar 7, 2023

I'm also having a hard time understanding why a link cannot be added to an event. It sounds like a preference to not add a field to the event for links or update the definition of an event to contain similar information. Is there any compatibility issues with this?

Given this is being looked at as an alternative to recording a link in other ways that all have explicit limitations (sampling, API compatibility, logical inclusion of trace IDs) this solution seems to be the only one without explicit limitation, instead there is a subjective limitation.

@Oberon00
Copy link
Member

Oberon00 commented Mar 8, 2023

Long-term, didn't we want to deprecate the Span events APIs in favor of the global events /logs?

@pyohannes pyohannes linked an issue Mar 10, 2023 that may be closed by this pull request
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 16, 2023
@lmolkova
Copy link
Contributor

One more reason why link is a log record + remote trace context is #2176 (comment)

Essentially, link as it's defined today

  message Link {
    // A unique identifier of a trace that this linked span is part of. The ID is a
    // 16-byte array.
    bytes trace_id = 1;

    // A unique identifier for the linked span. The ID is an 8-byte array.
    bytes span_id = 2;

    // The trace_state associated with the link.
    string trace_state = 3;

    // attributes is a collection of attribute key/value pairs on the link.
    // Attribute keys MUST be unique (it is not allowed to have more than one
    // attribute with the same key).
    repeated opentelemetry.proto.common.v1.KeyValue attributes = 4;

    // dropped_attributes_count is the number of dropped attributes. If the value is 0,
    // then no attributes were dropped.
    uint32 dropped_attributes_count = 5;
  }

mixes two concepts together: remote context and attributes.

I think limiting link to

  message Link {
    // A unique identifier of a trace that this linked span is part of. The ID is a
    // 16-byte array.
    bytes trace_id = 1;

    // A unique identifier for the linked span. The ID is an 8-byte array.
    bytes span_id = 2;

    // flags?

    // The trace_state associated with the link.
    string trace_state = 3;
  }

and adding link property (or list of them?) to LogRecord would preserve two different concepts.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 29, 2023
@github-actions
Copy link

github-actions bot commented Apr 6, 2023

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue area:sdk Related to the SDK spec:trace Related to the specification/trace directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Please (re)-allow recording links after Span creation time
9 participants