Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move yaml schema definition to semantic conventions #916

Closed
lmolkova opened this issue Apr 10, 2024 · 9 comments · Fixed by open-telemetry/weaver#307
Closed

Move yaml schema definition to semantic conventions #916

lmolkova opened this issue Apr 10, 2024 · 9 comments · Fixed by open-telemetry/weaver#307
Assignees

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Apr 10, 2024

Description

Schema definition for semconv yaml currently lives in the build tools repo along with the documentation that describes yaml format.

They should be part of semantic conventions and can be shared between different tooling (weaver/build-tools) and easily discoverable by semconv authors.

Additions/removals/changes should be tracked in the semconv repo as they affect all conventions and should be discussed/reviewed by wider community.

Additional context

We'll need to find a good process to update semconv yaml schema: new schema features would still need to be implemented in tooling repos before they can be brought in to semconv.

@lmolkova
Copy link
Contributor Author

lmolkova commented Apr 10, 2024

We'll need to find a good process to update semconv yaml schema

The ways to achieve it could be:

Option 1. Separate vCurrent and vNext schemas

  • keep two separate files - semconv.schema.json (vCurrent) and semconv.schema.next.json in the semconv repo
  • update semconv.schema.json with new tooling release that adds some new features

Cons: hard to keep track of difference between vNext schema and what tooling actually supports.
We don't want vNext schema to list features that are not coming in the next tooling version

Option 2. Keep vCurrent in semconv and vNext in tooling (only weaver)

  • when new tooling is released, semconv.schema.json in semconv is updated to one in the tooling repo

Cons: schema changes/proposals are less discoverable in the weaver.

Option 2.5. Same as p2, but start with a discussion in the semconv

  • Draft PR1 or design discussion issue in semconv repo with proposed schema changes
  • Once there is a consensus on the issue/draft PR, PR2 to tooling to implement new feature and update schema vNext.
  • After PR2 is merged and new tooling is released, schema is updated to vNext during release process

Cons: slightly more difficult process. Might need a formal way to accept schema changes

@jsuereth
Copy link
Contributor

Is there an option where:

  • We associate our usage of schema to a previous tag of semconv repository
  • We allow updates to semconv.schema.json and specification of it, but we don't allow that change to be used in YAML until a release of semconv repository.

@lmolkova
Copy link
Contributor Author

@jsuereth I believe all of the options do this (with a different file, not tag, but the same).
The key question is how to keep imaginary vnext in sync with actual implementation.

I.e. if we got 10 contributions to schema in semconv on top of tag A, but implemented 2 of them in weaver, it'd be a problem when we're going to release tag A + 1

@jsuereth
Copy link
Contributor

What I'm saying is that we can't bump the "allowed version" for YAML internally until weaver has adopted/implemented "vnext"

So there would always be a lag between when we release and when you can start moving YAML to the new version

@lmolkova
Copy link
Contributor Author

lmolkova commented Apr 10, 2024

so we'll have to do soft-backpressure and reject new schema features once we want to bump.

What I'm saying that we can apply this soft-backpressure right away and not even merge schema changes until weaver implements them.

[Update] I guess we can start without extra process and see how it goes

@pyohannes
Copy link
Contributor

keep two separate files - semconv.schema.json (vCurrent) and semconv.schema.next.json in the semconv repo

In my opinion the main point of the schema file to describe the actual schema of the YAML files that go along with it. I don't see the point of having a semconv.schema.next.json which conflicts with the format currently used in YAML specification files.

We'll need to find a good process to update semconv yaml schema

Versioning schema files could mitigate this. Although JSON schema doesn't support versioning out-of-the-box, it would be added like this:

{
   "$schema": "http://json-schema.org/draft-07/schema#",
   "description": "YAML schema for semantic convention generator, use for example with VS Code.",
   "additionalProperties": false,
   "type"       : "object",
   "required"   : ["version"],
   "properties" : {
     "version" : { "type":"string", "const":"1-1-0" },
     ...
   }
}

Given smaller (backward compatible) schema changes, a single version of Weaver could possibly support multiple different versions of the schema. Given bigger changes to the schema that might not be feasible, but given the versioning Weaver could at least immediately know whether the schema version it supports matches the schema version in the semconv repository.

A given version of the semantic convention repository would then contain the schema that corresponds to the YAML files in this version. A given version of Weaver would know about the schema version(s) this version of Weaver supports.

As related parts are spread out across two repositories this will always cause some coordination effort, overhead, and resulting lag. However, I think there would be a benefit of making dependencies as explicit and clear as possible.

@jsuereth
Copy link
Contributor

Yeah - I like this.

@lquerel
Copy link
Contributor

lquerel commented Apr 12, 2024

The proposal to have a check of the JSON schema version against the versions supported by Weaver seems good to me. In terms of process, a possible approach could be to generate the JSON schema from the Rust structures representing the semantic conventions, in order to limit discrepancies and ensure that changes are directly integrated into Weaver. Schemars is a Rust library that allows this generation. I have never tested it, but in my opinion, it would be worth trying.

@MadVikingGod
Copy link
Contributor

I'm very much in favor of this move, and I think that validation of the model vs its spec can and should be independent of build-tools or weaver. There are purpose-built tools for this, ajv and a github action using it.

How does this change how we can modify the semconv without the tools updating? I think there are two paths to this:

  1. Modify both tools to allow for unknown fields. This validation is now done within the semconv, so they don't need to check it. It should also be easy to access unknown fields within the code/markdown generation.
  2. Allow tooling to have fields that aren't in the spec. This lets tooling experiment with features before they are part of the semconv.

There is a tangential question of whether the weaver data shape should be used to generate the spec. I think this shouldn't be an automatic thing. While the tooling is welcome and can help, especially in the initial pass, if we tie weaver like this, we will have problems where we need a change to weaver to update the semconv, which we need to start using in weaver. If these are truly independent, we can make changes to Weaver and use this to generate, but it's not a prerequisite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants