Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly allow fail-fast for invalid enum values #2766

Closed
wants to merge 12 commits into from

Conversation

pellared
Copy link
Member

@pellared pellared commented Aug 31, 2022

Why

For the auto-instrumentation scenario, we find that crashing the application in case of an invalid configuration is better. The reason is that it gives the users immediate feedback and forces them to fix the issue. Otherwise, it is likely that will find it when it is too late. For example, they want to analyze the telemetry when something strange is going on with the application; it would occur that someone typed jeager instead of jaeger for OTEL_TRACES_EXPORTER, and because of that, they will miss spans in their backend.

This is our preference in .NET Auto-Instrumentation. Reference:

What

Added or cause the application to fail on initialization possibility when the user provides an enum value the SDK does not recognize.

AFAIK this is how the .NET SDK is working. See: https://github.com/open-telemetry/opentelemetry-dotnet/blob/a37198c6d0f1814f8f5a995413649436657a2e2b/src/OpenTelemetry.Exporter.OpenTelemetryProtocol/OtlpExporterOptions.cs#L78

Changes

Make the enum value description compatible with the basic error handling principle:

2. The API or SDK MAY _fail fast_ and cause the application to fail on initialization, e.g. because of a bad user config or environment, but MUST NOT cause the application to fail later at run time, e.g. due to dynamic config settings received from the Collector.

I think it is a backward compatible change as the old behavior is still acceptable per spec.

FYI @open-telemetry/dotnet-instrumentation-approvers

@pellared pellared requested review from a team August 31, 2022 10:58
@pellared
Copy link
Member Author

PTAL @open-telemetry/dotnet-maintainers

@pellared
Copy link
Member Author

The markdown-link-check build failures are not related to this PR.

@pellared pellared changed the title Allow fail-fast for invalid enum values Explicitly allow fail-fast for invalid enum values Aug 31, 2022
@nrcventura
Copy link
Member

I think this ability to fail fast is particularly important for the .NET auto-instrumentation project, because it is the best way for us to provide something to our users to let them know that the configuration is bad and needs to be fixed. Alternative options would likely lead to potentially unexpected behavior, such as no telemetry data being available. With the .net auto-instrumentation project developers may not even have access to the warnings that could be generated from an invalid enum, depending on how their how responsibilities are separated (separate observability team vs direct control of observability).

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how would this proposal work - imagine the environment variable is used by some exporter, and that exporter is enabled at a later stage (e.g. the application is using some dynamic setting to allow exporters to be loaded and initialized), the wrong environment variable will cause the application to crash.

@reyang reyang added area:sdk Related to the SDK area:configuration Related to configuring the SDK labels Sep 2, 2022
@pellared
Copy link
Member Author

pellared commented Sep 2, 2022

@reyang PTAL 8b28ee8

I made it explict that it cannot fail later at runtime; stolen from:

2. The API or SDK MAY _fail fast_ and cause the application to fail on initialization, e.g. because of a bad user config or environment, but MUST NOT cause the application to fail later at run time, e.g. due to dynamic config settings received from the Collector.

@pellared pellared requested a review from reyang September 2, 2022 16:42
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
@pellared pellared requested review from cijothomas and reyang and removed request for reyang and cijothomas September 2, 2022 17:40
@pellared
Copy link
Member Author

PTAL @open-telemetry/technical-committee @open-telemetry/specs-approvers

@pellared pellared requested a review from scheler September 15, 2022 16:04
@tigrannajaryan
Copy link
Member

I think this change results in more desirable behavior however it is not clear to me if this is an allowed breaking change or no. The document is marked as "Mixed" and there is no status in the "Enum value" section. For "Mixed" document do we implicitly assume unmarked sections to be "Stable" or "Experimental"?
I will approve if we believe this is an allowed change.

@pellared
Copy link
Member Author

pellared commented Sep 20, 2022

it is not clear to me if this is an allowed breaking change or no

I think it is a backward compatible change as the old functionality is still acceptable per spec. Personally, I see it more as clarification (or even fix) as this meets the basic error handling guidelines.

The document is marked as "Mixed" and there is no status in the "Enum value" section. For "Mixed" document do we implicitly assume unmarked sections to be "Stable" or "Experimental"?

This is a subject for a separate PR/discussion. We can mark it as stable in a following PR or create issue(s) to define what is missing to mark it as stable.

The whole "Special configuration types" section is marked as "Stable". Then I would say that "Enum value" section is "Stable" as well.

@pellared
Copy link
Member Author

I updated the description to describe some background.

@tigrannajaryan
Copy link
Member

The whole "Special configuration types" section is marked as "Stable". Then I would say that "Enum value" section is "Stable" as well.

I think this means this PR makes a breaking change to a "Stable" section of the spec. An application that previously would not fail can now fail after updating to a newer SDK that implements this change. That is not allowed.

Now, there is one argument that can be made to still allow this change: .NET SDK already works as this PR describes - it fails fast. Question: what do other language SDKs do currently? Do they fail fast or ignore the invalid values?

If all/most other SDKs also fail fast then I think we can say that the spec has a bug and what is written does not match the reality.

@pellared
Copy link
Member Author

An application that previously would not fail can now fail after updating to a newer SDK that implements this change.

That would be indeed a breaking change but in the SDK 😉 But I do not have a strong opinion here.

@carlosalberto
Copy link
Contributor

As discussed in yesterday's Spec call, one alternative is to fail fast when using a debug mode flag (maybe set through an environment variable). See #2817

Would this help at all for your considered case @pellared ?

@pellared
Copy link
Member Author

As discussed in yesterday's Spec call, one alternative is to fail fast when using a debug mode flag (maybe set through an environment variable). See #2817

Would this help at all for your considered case @pellared ?

@carlosalberto
Kind of yes. Thanks a lot 😉 I also answered in the new issue.

I have also a different question. Is "auto-instrumentation" seen as "SDK"? Maybe it is reasonable that some behavior in auto-instrumentation is different? E.g. fail-fast becasue the initializtion is done before the application is even started. I cannot find anything in the spec that would clarify it.

@reyang @cijothomas I created open-telemetry/opentelemetry-dotnet#3690

@@ -21,7 +21,9 @@ If an SDK chooses to support an integer-valued environment variable, it SHOULD s
### Enum value

For variables which accept a known value out of a set, i.e., an enum value, SDK implementations MAY support additional values not listed here.
For variables accepting an enum value, if the user provides a value the SDK does not recognize, the SDK MUST generate a warning and gracefully ignore the setting.
For variables accepting an enum value, if the user provides a value the SDK does not recognize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this behavior is only under Enum values, shouldn't there be a common error behavior for all env values? What if integer var is set to blah?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it as well, but I want to avoid scope creep in this PR. Even right now it is not clear how to proceed

For variables accepting an enum value, if the user provides a value the SDK does not recognize, the SDK MUST generate a warning and gracefully ignore the setting.
For variables accepting an enum value, if the user provides a value the SDK does not recognize,
the SDK MUST generate a warning and gracefully ignore the setting,
or _fail fast_ and cause the application to fail at startup, but MUST NOT cause the application to fail later at run time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is necessary - #2817 (comment)

@github-actions
Copy link

github-actions bot commented Oct 4, 2022

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 4, 2022
@pellared pellared closed this Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:configuration Related to configuring the SDK area:sdk Related to the SDK Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants