Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define API compatibility story #115

Conversation

tigrannajaryan
Copy link
Member

@tigrannajaryan tigrannajaryan commented Jun 13, 2019

We need to define the API compatibility story, particularly:

  • How OpenTelemetry API evolves over time, what changes are allowed and how they should be made, how breaking and non-breaking changes are handled.
  • How we ensure third party libraries that are instrumented using old OpenTelemetry versions remain usable after the API changes.
  • How we ensure interoperability of application code and of multiple third party libraries that are instrumented using different OpenTelemetry versions.

Go toy example: https://github.com/tigrannajaryan/otelapi-compat
Java toy example: https://github.com/tigrannajaryan/otelapi-java-compat

Implements #109

specification/compatibility.md Outdated Show resolved Hide resolved
Any application or third party library that uses OpenTelemetry API of a particular version can safely be packaged with OpenTelemetry SDK implementation provided that these 2 conditions are met:

1. The full version number of the SDK is not lower than the full version number of API used by application or third party library.
2. The major version number of the SDK is the same as major version number of API or major version number of the SDK is higher by 1.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do we want to provide stronger guarantees across more than one major version number? Or perhaps we don't want to provide compatibility guarantees across major versions at all?
I felt that one major version apart is a good approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to tightly couple API version with SDK version? We may need to do major rev of SDK because some of its SPI interfaces change, while the API remains unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not tightly couple API and SDK versions. The rationale is that you cannot expect all third party libraries that a developer may use in an application to all have been instrumented against the same OpenTelemetry API. Third-party developers move at their own pace, you usually move at your own pace on your own app. There has to be a way for mixed-version usage. At the minimum code linked to APIs across minor versions should be interoperable, but I think that is not good enough and 1 major version apart is the minimum guarantee that is necessary for real-life usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on leaving this as a TODO for the time being so we can have a more full-fledged discussion without blocking the PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the current wording is still "tight coupling" to me, as it implies correlation between API and SDK versions. To take example of OpenTracing: Python API is v2, Jaeger SDK is v4. Granted that in OpenTelemetry the API and SDK are more closely related, but there are still potential major versions of SDK (e.g. changing from JS to Typescript as the original source) that do not affect major rev of the API.

This is separate from the point that a version of SDK must support some V and V-1 of the API.

specification/compatibility.md Outdated Show resolved Hide resolved

The above is specified using Java's convention of reverse domain name notation. Other languages will use their equivalent convention for namespaces, e.g. for Go the package name for version 1.x.x of the API will be go.opencensus.io, for version 2.x.x of the API it will be go.opencensus.io/v2.

## Interoperability of Mixed-version Code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how this would work. This might work for libs that are used at the leaves of the call graph, but what if we have server middleware storing v2 span in context, and http client lib middleware that expects v1 span?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this? I am not sure I fully understand the example that you brought.

Copy link
Member

@yurishkuro yurishkuro Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in this picture:
image

  • Handler instrumentation is v2 and stores io.opentelemetry.v2.Span in the Context
  • Client instrumentation is v1 and tries to read io.opentelemetry.v1.Span from the Context, which is not going to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the picture, it is clear now.

I do not see this as a problem that is induced by the decision to support multiple API versions in parallel in OpenTelemetry and let me explain why I think so.

Based on your description I am going to assume that Context is a container that has a pair of GetSpan/SetSpan methods that allow to store and retrieve arbitrary untyped value of Span (an Object in Java's parlance or interface{} in Go's).

In this situation I believe the problem is a result of incorrect assumptions of Client which reads an untyped data type from Context and treats it as if it has a certain type when in fact it can be of any type.

If the Context is a container of untyped data types where I can store 2 unrelated data types (io.opentelemetry.v2.Span and io.opentelemetry.v1.Span) then users of this container (namely Handler and Client) cannot expect that the untyped container will magically solve their interoperability problem.

If Handler and Client want to interchange data the responsibility is on them to define a data exchange contract. I fail to see how it can be the responsibility of generic Context class which is merely a carrier of untyped data.

Now, if we assume the opposite, i.e. that Context is a strongly typed container and supports GetSpan/SetSpan strongly typed methods then this problem cannot exist because Context itself must be versioned, so we will have io.opentelemetry.v1.Context and io.opentelemetry.v2.Context, each having a pair of GetSpan/SetSpan methods that work only with corresponding data type. In that case the SDK can ensure interoperability of v1.Context and v2.Context types by converting v1.Span into v2.Span internally (but I am not sure it should try to, that's a separate question worth thinking about).

I may be wrong, happy to hear your thoughts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I created an example of what OpenTelemetry API V2 implementation can look like, so that it provides a shim for API V1 and how exactly it can be used in the scenario that you described: with a middleware that uses API V1 and an app client that uses API V2.

It shows how the shim ensures the interoperability by mapping the API V1 Span methods to API V2 Span methods as needed.

Check this out: https://github.com/tigrannajaryan/otelapi-compat

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work with go modules? The v2 API needs to include a v1 package whose implementation is different from the real (old) v1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(OK, now I can reply to the comment, strange, so copying my reply here).

There are probably different ways to achieve this in Go. One approach is to have an additional indirection level from API to SDK. This follows from the the requirement to be able to swap out SDK implementations laid out here: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/library-guidelines.md

If you have an indirection layer in API v1 which allows registering implementation at runtime, then the code in SDK v2 will register its own implementation of API v1 at startup.

I am not sure if there is a way to perform this swapping in Go at compile time, at least I cannot see it yet.

For Java this is easier to do since you can just provide a different jar which implements the exact same API in a different, so there is no need to have runtime swapping, it can happen at compile time.

Just to reiterate, this is not a requirement that comes from this PR, it is something that has been agreed to be part of the spec in the past.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can modify my example source code to illustrate what I mean if the description is not clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro I modified my toy example at https://github.com/tigrannajaryan/otelapi-compat/ to demonstrate how API and SDK can be decoupled and how v2 SDK can implement both v1 and v2 APIs and make it all work on mixed-code app without touching third-party library code.

specification/compatibility.md Outdated Show resolved Hide resolved
specification/compatibility.md Outdated Show resolved Hide resolved
specification/compatibility.md Outdated Show resolved Hide resolved
Any application or third party library that uses OpenTelemetry API of a particular version can safely be packaged with OpenTelemetry SDK implementation provided that these 2 conditions are met:

1. The full version number of the SDK is not lower than the full version number of API used by application or third party library.
2. The major version number of the SDK is the same as major version number of API or major version number of the SDK is higher by 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on leaving this as a TODO for the time being so we can have a more full-fledged discussion without blocking the PR?

specification/compatibility.md Outdated Show resolved Hide resolved
@tigrannajaryan
Copy link
Member Author

Everyone, I would like to have a discussion about API compatibility in the next week's specification SIG meeting. I believe it is important to define this early, otherwise compatibility issues will bite us back in the future. Please join the meeting on Tue, June 18, 8am PT.

Agenda and Zoom link: https://docs.google.com/document/d/1-bCYkN-DWJq4jw1ybaDZYYmx-WAe6HnwfWbkm8d57v8/edit#heading=h.v6lhw87olr5l

1. The full version number of the SDK is not lower than the full version number of API used by application or third party library.
2. The major version number of the SDK is the same as major version number of API or major version number of the SDK is higher by 1.

For example, an application that uses API 1.5.10 can be packaged with SDK 1.5.12 or SDK 2.0.0, but it cannot be packaged with SDK 1.0.0 (too old) or with SDK 3.0.0 (too new).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would 3.0 be too new for 1.5 but 2.0 would be fine? They could both break things in equally bad ways right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more old versions you decide to support the more maintenance work it requires. I believe supporting one previous major version is a good balance, it gives people who depend on OpenTelemtry enough time to upgrade, yet allows us to have manageable amount of old / deprecated functionality that we need to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside the approach we end up taking, I do agree on @tigrannajaryan approach: one previous major version to support.

@tigrannajaryan
Copy link
Member Author

Reviewers, I think I addressed all comments. Please have a look again. If there are no other questions I'd like to merge this and address any outstanding issues separately (I filed a couple as a result of discussions we had on this PR).


New functionality in minor version increases may be added in the form of new classes, methods or functions in a way that does not affect existing functionality.

## Compatibility of SDK Implementations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be separating API and SDK specs.


The above is specified using Java's convention of reverse domain name notation. Other languages will use their equivalent convention for namespaces, e.g. for Go the package name for version 1.x.x of the API will be go.opencensus.io, for version 2.x.x of the API it will be go.opencensus.io/v2.

## Interoperability of Mixed-version Code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work with go modules? The v2 API needs to include a v1 package whose implementation is different from the real (old) v1.

@tigrannajaryan
Copy link
Member Author

How would this work with go modules? The v2 API needs to include a v1 package whose implementation is different from the real (old) v1.

@yurishkuro For some reason I cannot reply to this comment directly, so replying here.

There are probably different ways to achieve this in Go. One approach is to have an additional indirection level from API to SDK. This follows from the the requirement to be able to swap out SDK implementations laid out here: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/library-guidelines.md

If you have an indirection layer in API v1 which allows registering implementation at runtime, then the code in SDK v2 will register its own implementation of API v1 at startup.

I am not sure if there is a way to perform this swapping in Go at compile time, at least I cannot see it yet.

For Java this is easier to do since you can just provide a different jar which implements the exact same API in a different, so there is no need to have runtime swapping, it can happen at compile time.

Just to reiterate, this is not a requirement that comes from this PR, it is something that has been agreed to be part of the spec in the past.

@cstockton
Copy link

@tigrannajaryan I appreciate the work you are doing here, really, I don't want to detract from the fact you are writing documentation on the current design. Thank you.

That said maybe you can shed some light on why we are defining this at a software integration level. This is going to require massive engineering efforts and is already influencing "plugin" based design in the client libraries like open-telemetry/opentelemetry-go#19. This specification you are writing is essentially going to devolve into an aggregation of man ld* best practices, e.g. foo.so.X. Consequentially OpenTelemetry will struggle to add new features because ABI is as brittle as it comes across every potential architecture and system then a network transport.

I am trying to follow these issues and conversations but I just can't seem to find what the objections to defining the "Tracer Transport Protocol" (the name is for illustration) whatever that may be. If we can take a small bit of time to do this, it lowers the work across the board and sets a solid foundation to build on. Most importantly it doesn't stop OpenTelemetry at that point to writing monolithic SDK's which integrate many vendors and platforms exactly as they are today. But it does provide a standard specification for users that disagree with such a design from having to integrate with them.

What are you thoughts on:

  1. pause momentarily
  2. define TTP
  3. implementations of TTP, use them to inform changes in 2
  4. unpause - Resume exact plans for SDK's, which push traces over TTP

If wants to support the SDK's they can, just as they do under the current design. However they now have the option to add a flag on their existing server endpoints to accept those trace formats --ttp-address, allowing them to leverage their existing libraries in the language they have the most resources in and avoiding the potential bureaucracy of "getting permission" from OpenTelemetry. I can write a client to talk to said vendor and cut out the middleman if I want to. OpenTelemetry may provide enough value for me to choose to use it for a given language, but I have a choice. The model just scales better and will result in best-in-class solutions to each problem. Maybe the most reliable kafka ingest server lands at <$USER>/ttp2kafka and is written in rust. A highly reliable / configurable tpp2es ships to Elasticsearch with TLS support written in nodejs. Cool, they are just servers and we only need a single reliable client in each language.

Small note: I would be more than happy to contribute to helping define the specification if I knew it wouldn't be dismissed due to we can always do that later.

@tigrannajaryan
Copy link
Member Author

Reviewers, please comment.

I'd like to move forward. If this is proposal is acceptable in principle let's merge it and we can improve it in additional PRs.

If it is not acceptable let's reject it, I am OK with that too.

I strongly believe that we need to have a well-defined compatibility story (not necessarily exactly the story suggested in this PR). The story needs to be publicly available. People who evaluate OpenTelemetry for use in their applications or libraries will look into our compatibility guarantees as part of their evaluation. Not having clearly articulated compatibility specification is not helping us.

@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Jul 18, 2019

@yurishkuro as discussed I have put together the Java version of the API/SDK version compatibility toy demo: https://github.com/tigrannajaryan/otelapi-java-compat

It is less elaborate than the Go version (e.g. Context propagation is not implemented) but I believe the idea is clear anyway.

Please let me know if you have any other concerns.

@SergeyKanzhelev
Copy link
Member

@tigrannajaryan can you move this forward? Or we can abandon this PR for now

@tigrannajaryan
Copy link
Member Author

@SergeyKanzhelev let me think a bit about how this will work for our recently agreed version labeling and I will post an update to this PR or ask for postponing it.

@SergeyKanzhelev
Copy link
Member

@tigrannajaryan I'll close this PR for now than. It shouldn't be a problem to re-open it.

@tigrannajaryan
Copy link
Member Author

ok, thanks @SergeyKanzhelev

TuckTuckFloof pushed a commit to TuckTuckFloof/opentelemetry-specification that referenced this pull request Oct 15, 2020
rockb1017 pushed a commit to rockb1017/opentelemetry-specification that referenced this pull request Nov 18, 2021
…n-telemetry#115)

* Update CODEOWNERS to add docs team as reviewers for docs changes

As discussed in Slack, this PR aims at standardizing docs team as reviewers for all changes concerning documentation in our repos.

* Update specification/templates/.github/CODEOWNERS

Co-authored-by: Robert Pająk <pellared@hotmail.com>

Co-authored-by: Robert Pająk <pellared@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants