Skip to content

Conversation

aepfli
Copy link
Member

@aepfli aepfli commented Jun 5, 2025

Within a lot of system flag changes are propagated asynchronously to the consumers (not during evaluations). This propagations can be enhanced via spans and traces on their own to see how flag changes are distributed throughout the system.

But in the end, as an enduser i am curious, which propagation event actually is linked to my evaluation. It is possible to somehow determine this via version, but spans and traces offer a deeper insight. If we standardize this behaviour, we can create out of the box OpenTelemetry Configuration Changed Listeners, to make this feature usable with all our providers.

This PR

  • adds this new feature

Related Issues

Relates: open-feature/flagd#1595

Notes

Follow-up Tasks

How to test

Within a lot of system flag changes are propagated asynchronously to the consumers (not during evaluations). This propagations can be enhanced via spans and traces on their own to see how flag changes are distributed throughout the system. 

But in the end, as an enduser i am curious, which propagation event actually is linked to my evaluation. It is possible to somehow determine this via version, but spans and traces offer a deeper insight. If we standardize this behaviour, we can create out of the box OpenTelemetry Configuration Changed Listeners, to make this feature usable with all our providers.
@aepfli
Copy link
Member Author

aepfli commented Jun 5, 2025

java event listener example

    private static void onChange(EventDetails eventDetails) {
        LOG.info("Provider configuration changed: {}", eventDetails.getEventMetadata());
        if (eventDetails.getEventMetadata() == null) {
            return;
        }

        String propagationTraceId = eventDetails.getEventMetadata().getString("propagationTraceId");
        String propagationSpanId = eventDetails.getEventMetadata().getString("propagationSpanId");
        if (propagationTraceId == null || propagationSpanId == null) {
            return;
        }
        SpanContext parentContext =
                SpanContext.createFromRemoteParent(propagationTraceId,
                        propagationSpanId,
                        TraceFlags.getSampled(),
                        TraceState.builder().build());

        Tracer t = GlobalOpenTelemetry.getTracer("demo");
        SpanBuilder sb = t.spanBuilder("flag updates");
        sb.setParent(Context.current().with(Span.wrap(parentContext)));

        Span span = sb.startSpan();

        span.addEvent("someEvent");

        span.end();
    }

Copy link
Member

@lukas-reining lukas-reining left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea @aepfli!
From reading this I am not 100% sure how it would would exactly, left some questions.
The example answers some of the questions but from the spec I do not fully understand how it is meant to work.


### Propagation Metadata

Feature Flags are propagated through different systems with different methods. Often this updates have an asynchronous nature to the evaluation and do not correlate directly (eg. cached values or in-process evaluations). For distributed systems it is important to reflect how changes are populate to all systems, and how those correlate with evaluations. In a simple manner the version could be used to achieve this, but offers additional and more complex solution to correlate the data. Instead we are defining two additional metadata properties `propagationTraceId` and `propagationSpanId` which can be used to link evaluation spans to propagation spans.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Feature Flags are propagated through different systems with different methods. Often this updates have an asynchronous nature to the evaluation and do not correlate directly (eg. cached values or in-process evaluations). For distributed systems it is important to reflect how changes are populate to all systems, and how those correlate with evaluations. In a simple manner the version could be used to achieve this, but offers additional and more complex solution to correlate the data. Instead we are defining two additional metadata properties `propagationTraceId` and `propagationSpanId` which can be used to link evaluation spans to propagation spans.
Feature Flags are propagated through different systems with different methods. Often these updates have an asynchronous nature to the evaluation and do not correlate directly to it (eg. cached values or in-process evaluations). For distributed systems it is important to reflect how changes in flag configurations are propagated to all systems, and how those correlate with evaluations. In a simple manner the version could be used to achieve this, but offers additional and more complex solution to correlate the data. Instead we are defining two additional metadata properties `propagationTraceId` and `propagationSpanId` which can be used to link evaluation spans to propagation spans.


### Propagation Metadata

Feature Flags are propagated through different systems with different methods. Often this updates have an asynchronous nature to the evaluation and do not correlate directly (eg. cached values or in-process evaluations). For distributed systems it is important to reflect how changes are populate to all systems, and how those correlate with evaluations. In a simple manner the version could be used to achieve this, but offers additional and more complex solution to correlate the data. Instead we are defining two additional metadata properties `propagationTraceId` and `propagationSpanId` which can be used to link evaluation spans to propagation spans.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead we

I would not use first person here.


### Propagation Metadata

Feature Flags are propagated through different systems with different methods. Often this updates have an asynchronous nature to the evaluation and do not correlate directly (eg. cached values or in-process evaluations). For distributed systems it is important to reflect how changes are populate to all systems, and how those correlate with evaluations. In a simple manner the version could be used to achieve this, but offers additional and more complex solution to correlate the data. Instead we are defining two additional metadata properties `propagationTraceId` and `propagationSpanId` which can be used to link evaluation spans to propagation spans.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propagationTraceId and propagationSpanId

I have some things that are not fully clear to me from reading this:

How are these defined?
Do we typically fill these with the OTEL values? Which span do we use then? Trace would probably be the root one?
How are we setting them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The propagationTraceId and propagationSpanId are flexible. eg. in flagd we would create a trace and span for each GRPC event and add this information to the payload. eg. GRPC streams don't offer a way to propagate headers for events only, they have to be part of the payload. Other connection methods might autopropagate them if possible. But with the persistence in the metadata, we can link the propagating span to the evaluating span (which might be two totally different occasions/traces).

I am not the best one, when writing specs, this is my first attempt, and I am happy to explain my thoughts, and maybe this will help to solve the confusion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But with the persistence in the metadata, we can link the propagating span to the evaluating span (which might be two totally different occasions/traces).

I get that concept.

What I am not sure about is, how do we define all these things exactly.
E.g. for the propagating span I can imagine multiple definitions, one of them being `the span of the incoming http request".
This span id e.g. will be different between all services, the traceparent id in otel might be the same between all of the services. This part is not shown in your example.

Maybe it is good enough to add 1 or 2 good examples for a good value for the ids.


### Propagation Metadata

Feature Flags are propagated through different systems with different methods. Often this updates have an asynchronous nature to the evaluation and do not correlate directly (eg. cached values or in-process evaluations). For distributed systems it is important to reflect how changes are populate to all systems, and how those correlate with evaluations. In a simple manner the version could be used to achieve this, but offers additional and more complex solution to correlate the data. Instead we are defining two additional metadata properties `propagationTraceId` and `propagationSpanId` which can be used to link evaluation spans to propagation spans.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a simple manner the version could be used to achieve this, but offers additional and more complex solution to correlate the data.

Do you mean the version that we defined on the OTEL semconv?
What do you mean by "offers additional and more complex solution"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In appendix D, we specify that flag metadata can contain a version https://openfeature.dev/specification/appendix-d#flag-metadata - theoretically,y we could use the version field too, to link the propagating span/trace, to the evaluation span/trace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In appendix D, we specify that flag metadata can contain a version

Yeah, I mean, that I would love this to be a bit more clear that this version is used.

theoretically,y we could use the version field too, to link the propagating span/trace, to the evaluation span/trace.

Okay, but what what do you mean with: but offers additional and more complex solution to correlate the data.?

Copy link
Member Author

@aepfli aepfli Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use this version to correlate, but this proposal adds another attribute, which contains the traceid and spanid of the propagating cause.

eg. in flagd in-process, we have two non-connected steps for evaluations.

  1. Flag configurations are distributed asynchronously. When Flagd detects a change in the flag source, it sends out a new configuration, which will have a version attribute in the metadata.
  2. When I do evaluations, I can create a new span, and if I want to correlate this with the span of the propagation, I need to check for the version attribute, and I don't have a direct link. Otel offers to link traces, to represent a correlation. With the information of the trace and span, we can create this link. And create a holistic image from propagation span to evaluation span, and how it is linked.

Copy link
Contributor

@dyladan dyladan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR is attempting to recreate span links. Any reason you chose to go this route rather than using that mechanism? https://opentelemetry.io/docs/concepts/signals/traces/#span-links

@aepfli
Copy link
Member Author

aepfli commented Sep 25, 2025

yes i want to use span-links, but somehow i need to propagate this data/information through the system. A grpc stream does have headers, but they are only set upon initialization. so if i want to track a grpc message as an own trace. or ideally from the propagation start till the end, i need to somehow pass on the information of the span/trace with the message. i have not found another solution for that, but i am also not as familiar with openTelemetry. my research did not help nor provided more insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants