Implement `weaver registry infer` command by ArthurSens · Pull Request #1138 · open-telemetry/weaver

ArthurSens · 2026-01-14T11:25:11Z

TLDR

Implements weaver registry infer command that generates a semantic convention registry YAML file by inferring the schema from incoming OTLP telemetry data.

Description

This PR adds a new weaver registry infer subcommand that starts a gRPC server to receive OTLP messages (traces, metrics, logs) and automatically infers a semantic convention schema from the observed telemetry. The command processes incoming data, deduplicates attributes across signals, and collects up to 5 unique example values per attribute to help document the inferred schema.
The inferred schema is written to a single registry.yaml file in the specified output directory (default: ./inferred-registry/). The output follows the standard semantic convention format with separate groups for resources, spans, metrics, and events. Resource attributes are currently accumulated into a single resource group; entity-based grouping (via OTLP EntityRef) is not yet supported but documented for future implementation.

Testing

Tested by using weaver registry emit to send OTLP telemetry to the infer command's gRPC endpoint. The generated registry.yaml file was verified to contain the expected groups (resources, spans, metrics, events) with properly inferred attribute types and example values.

github-advanced-security

clippy found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

ArthurSens · 2026-01-14T11:30:37Z

Opening as a draft first, manually tested and it seemed to work :)

Some question that I have:

Should we build v2 schemas instead of v1?
I've created an object called YamlGroup to serialize the YAML file because I couldn't find another object that already does this. Don't we have something like that already? Could we re-use objects the unserialize YAML to also do the serialization somehow?
Is code organized correctly? I'm still struggling to understand when code should go to a separate crate and when it should be in the CLI module.
Do we want to implement entity inference already? I'm not sure how stable Entities are

codecov · 2026-01-14T11:35:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.0%. Comparing base (dfe4670) to head (03d0ad0).

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #1138   +/-   ##
=====================================
  Coverage   80.0%   80.0%           
=====================================
  Files        109     109           
  Lines       8528    8528           
=====================================
  Hits        6823    6823           
  Misses      1705    1705

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ArthurSens · 2026-01-14T18:07:52Z

src/registry/infer.rs

+        let entry = self
+            .spans
+            .entry(span.name.clone())
+            .or_insert_with(|| AccumulatedSpan::new(span.name.clone(), span.kind.clone()));


Some extra thoughts here: When the same span name is received multiple times with different kind values, only the first kind is preserved.

Not sure what to do to be honest, should I use more than just span/metric/event/resource name as the identifier? Maybe use all the fields that identify a particular telemetry type?

There currently is no identifier for a span so anything done here is guessing. Live-check cannot do span comparisons at the moment because of this too. It just looks at the attributes within.

src/registry/infer.rs

nicolastakashi · 2026-01-16T13:22:36Z

src/registry/infer.rs

+    }
+
+    fn add_metric(&mut self, metric: SampleMetric) {
+        let instrument = match &metric.instrument {


What does it means, sorry maybe I'm not familiar with the concept of metric.instrument, is this the metric type?

Kind of, it's gauge, updowncounter, histogram...

Being a bit more specific for the discussion we had in our call @nicolastakashi, summaries are indeed deprecated in the OTLP proto, and Weaver is very explicit about not supporting it:

weaver/src/registry/otlp/conversion.rs

Line 128 in b55473e

Some(Data::Summary(_)) => SampleInstrument::Unsupported("Summary".to_owned()),

jerbly · 2026-01-17T18:53:25Z

src/registry/infer.rs

+                attributes,
+            });
+
+            // Span events as separate event groups


Interesting... are span_events and log_events the same thing in semconv? In live-check I just check the attributes for span_events, whereas logs with event_name are checked against event definitions. @jsuereth / @lmolkova ?

Just some comments from the SIG meeting today:

both span and log events can be mapped to events. Eventually, span events will be deprecated and we can remove this functionality from infer and add a warning in live check if they are present

jerbly · 2026-01-17T19:31:33Z

Opening as a draft first, manually tested and it seemed to work :)

Some question that I have:

Should we build v2 schemas instead of v1?

I've created an object called YamlGroup to serialize the YAML file because I couldn't find another object that already does this. Don't we have something like that already? Could we re-use objects the unserialize YAML to also do the serialization somehow?

Is code organized correctly? I'm still struggling to understand when code should go to a separate crate and when it should be in the CLI module.

Do we want to implement entity inference already? I'm not sure how stable Entities are

IMO, we should make v2.
We should use the weaver_semconv crate to make the structure and then Serialize that to YAML.
As it stands it's OK. See below for further thoughts that might change this...
I'm not sure either, this has also not been done in live-check. @jsuereth to comment.

Overall I'm wondering what the intent of this command is. What you have made takes samples and aggregates them into entirely new definitions, I guess to use as a starting point model?

What I had in mind would have been more embedded in live-check, maybe a --infer option to live-check. You would then be comparing samples with an existing model, the otel semconv model by default. The inference would then be to create a new model that depends on and extends the model you're comparing with. This would make a definition with imports, refs and extends.

Also, live-check would be highlighting any items which would be troublesome to make an inference for: e.g. an attribute named MyAttr would fail policy checks around naming conventions (should be some_namespace.my_attr for example).

ArthurSens · 2026-01-19T13:53:58Z

Overall I'm wondering what the intent of this command is. What you have made takes samples and aggregates them into entirely new definitions, I guess to use as a starting point model?

Exactly. While giving talks about Weaver last year, a very common question was: "I have thousands of metrics already, I don't want to manually rewrite what I have into a schema. Is there anything to make this easier?". That's the problem I'm trying to solve here. As long as there's an appropriate receiver in the collector, you can send data in any format, translate to OTLP, send it to Weaver Infer and you'll have your OTel Schema available. It's up to you to do further modifications to the schema as needed. With a schema available, code generation could build dashboards, could generate instrumentation code that helps migrate from one SDK to another, etc etc.

To be honest, I'm even envisioning a combined functionality of weaver serve+infer, where inferred schemas could be modified through the UI before the user "commits" them to the registry.

The inference would then be to create a new model that depends on and extends the model you're comparing with. This would make a definition with imports, refs and extends.

Interesting! This hasn't crossed my mind at all before. Could you elaborate a bit on the use case for this? What are the problems you wanted to solve?

jerbly · 2026-01-20T00:55:57Z

Interesting! This hasn't crossed my mind at all before. Could you elaborate a bit on the use case for this? What are the problems you wanted to solve?

If you run live-check today with an empty registry it will produce an output with every sample and, where possible, it will tell you every attribute and signal is missing in the live_check_result for that sample. You could imagine taking the json report from this live-check and producing an inferred registry like you've done with your code.

Now extend this concept. Rather than starting with an empty registry, start with the OTel semconv registry. The output report can now be interpreted to infer either modifications to the registry, or extensions to it in a child registry.

At my company we have a company-registry which is dependent on the OTel registry. We often find attributes and signals we want to express that fit in the OTel namespaces for example aws. Let's say my application emits aws.s3.bucket and aws.new.attr. I don't want to define aws.s3.bucket again since it's already in the OTel registry, I just want to modify my company registry to add aws.new.attr.

As another example, you produced a registry in your PR: prometheus/prometheus#17868 - moving forward, you could run the live-check inference again with this registry loaded and infer modifications to it alongside live-check telling you what's missing or invalid.

ArthurSens · 2026-01-20T22:36:11Z

If you run live-check today with an empty registry it will produce an output with every sample and, where possible, it will tell you every attribute and signal is missing in the live_check_result for that sample. You could imagine taking the json report from this live-check and producing an inferred registry like you've done with your code.

So with your idea, if we add a --infer flag to live-check, instead of a json output we would get the YAML file as done in this PR so far?

I can work with that :)

Now extend this concept. Rather than starting with an empty registry, start with the OTel semconv registry. The output report can now be interpreted to infer either modifications to the registry, or extensions to it in a child registry.

Hmmm, I think I understand some parts but others I'm still feeling a bit lost.

The output report could be interpreted as extentions in a child registry: We can infer this information if the OTLP message includes Samples that were not present before, is that correct?
The output report could be interpreted as modifications to the registry: This is the part where I'm not understaning how we could tell. If our registry has a sample called metric.X, the OTLP message doesn't include this Sample but includes metric.Y... How do I know the difference between a Sample that was renamed or a Sample that was removed completely and a new unrelated one was added?

jerbly · 2026-01-21T02:58:48Z

So with your idea, if we add a --infer flag to live-check, instead of a json output we would get the YAML file as done in this PR so far?

I can work with that :)

No, I'm doing a bad job trying to explain this I think.

The output report could be interpreted as extentions in a child registry: We can infer this information if the OTLP message includes Samples that were not present before, is that correct?

I'm thinking the command could be: weaver registry live-check -r https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.38.0.zip[model] --infer new - this would collect samples and compare them with the otel registry. Let's say one of the samples is for metric.X with attributes: server.address and server.port. metric.X is not found in the otel registry but server.address and server.port are. The inferred output would be a new registry defining metric.X with references to server.address and server.port. Since we ran --infer new, weaver would also create a registry_manifest.yaml declaring the dependency on https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.38.0.zip[model].

The output report could be interpreted as modifications to the registry: This is the part where I'm not understaning how we could tell. If our registry has a sample called metric.X, the OTLP message doesn't include this Sample but includes metric.Y... How do I know the difference between a Sample that was renamed or a Sample that was removed completely and a new unrelated one was added?

This use case could be a later phase.
In this case, the command could be: weaver registry live-check -r my_model_dir --infer modify - in this case, new registry files are created but suffixed with _inferred. Those registry files are a copy of the original with modifications made to them with any changes inferred from the live-check result. For example, let's say we used the registry generated in the example above. We receive a sample of metric.X with the server attributes but now also the attribute error.type. The registry is modified to add this attribute to the metric. This would retain any non-inferrable fields in the original registry e.g. brief, note, annotations.

I think we would need options to determine if weaver should add or overwrite when it finds differences. And, if you want weaver to remove definitions if they were not received in the samples.

--infer modify is quite a bit more complicated and I'm not sure it's worth it. But --infer new, where we're making a dependent child registry I think is important and inline with our multi-registry philosophy.

src/registry/infer.rs

ArthurSens · 2026-01-21T21:41:56Z

Ok, I think I've addressed all comments that are addressable, given what we discussed in the SIG meeting today.

I'm intentionally letting some things undone to keep the scope of the PR small and easier to review:

I'm generating v1 schemas instead of v2 -- Not sure if the plan is to allow both, as generate does, or if I should replace v1 with v2 entirely in the future.
Functionality to compare the incoming OTLP messages with already existing registries, so inferred schemas use extends and/or imports directives instead of duplicating an entire semantic convention.

But please let me know if any of the above should be worked on in this PR, and if there's anything else you'd like to see here

jerbly · 2026-01-30T01:12:43Z

src/registry/infer.rs

+            let attr_entry = entry
+                .attributes
+                .entry(attr.name.clone())
+                .or_insert_with(|| {
+                    AccumulatedAttribute::new(attr.name.clone(), attr.r#type.clone())
+                });
+            attr_entry.add_example(&attr.value);


Let's say I have messy real world telemetry. I may receive attr=42 and then in another sample attr="hello". This would make an attr of type int with examples [42,"hello"].

How should mismatched types be handled?

Good catch, I didn't think of this. I'm updating the PR to ignore attribute values that differ from the original value received.

This will probably create some race conditions, but not sure what else could be done here 🤔

jerbly · 2026-01-30T02:24:15Z

src/registry/infer.rs

+#[derive(Debug, Clone)]
+struct AccumulatedAttribute {
+    name: String,
+    attr_type: Option<PrimitiveOrArrayTypeSpec>,


Does this need to be optional? Attributes must have a type.

jerbly · 2026-01-30T02:25:26Z

src/registry/infer.rs

+#[derive(Debug, Clone)]
+struct AccumulatedMetric {
+    name: String,
+    instrument: Option<InstrumentSpec>,


Should this be optional?

jerbly · 2026-01-30T02:28:55Z

src/registry/infer.rs

+    fn add_metric(&mut self, metric: SampleMetric) {
+        let instrument = match &metric.instrument {
+            SampleInstrument::Supported(i) => Some(i.clone()),
+            SampleInstrument::Unsupported(_) => None,


If we receive and unsupported instrument, we can't infer a semconv for it by definition. So we should reject the sample as not inferable.

Ok gotcha, that makes sense. That allows removing the Option from instrument as well then

jerbly · 2026-01-30T02:41:28Z

src/registry/infer.rs

+    let attr_type = attr
+        .attr_type
+        .clone()
+        .unwrap_or(PrimitiveOrArrayTypeSpec::String);


As mentioned in an earlier comment. IMO we should not need to have an Optional type and therefore this goes away. I'm guessing this optionality comes from Sample* where we allow missing type or value. This makes sense for live-check, we can just compare an attribute name on its own for validity. It doesn't make sense for infer, this data is mandatory to make a semconv definition.

I would recommend removing the Options where data is mandatory and rejecting samples with None types rather than carrying the Option all through the code.

jerbly · 2026-01-30T02:55:27Z

src/registry/infer.rs

+/// Convert a vector of JSON values to the appropriate Examples type.
+///
+/// Uses serde to automatically match the JSON values to the correct Examples variant.
+fn json_values_to_examples(values: &[Value]) -> Option<Examples> {


I mentioned this above too. All examples must be the same type and must match the type of the attribute. If you've determined int they must all be int. (Of course one strategy to handle mismatching types is to change the type to Any and then you can have Any examples).

You could build the Examples as they arrive in the samples rather than this two stage process. Perhaps a function add_example(&value, &examples) -> Result<Examples, Error> where it will make a new Examples given the current examples and value (perhaps changing a single example to an array).

jerbly · 2026-01-30T02:58:52Z

src/registry/infer.rs

+}
+
+fn sanitize_id(name: &str) -> String {
+    name.replace(['/', ' ', '-', '.'], "_")


Do we want to convert . to _? The namespace separator in OTel is .

whoops, force of habit 😅

jerbly · 2026-01-30T03:01:47Z

src/registry/infer.rs

+
+fn sanitize_id(name: &str) -> String {
+    name.replace(['/', ' ', '-', '.'], "_")
+        .to_lowercase()


Perhaps there's a way to use the convert_case crate to snake_case first and then deal with invalid chars. This would convert HelloWorld to hello_world too which is preferable to helloworld.

jerbly · 2026-01-30T03:05:27Z

src/registry/infer.rs

+
+/// Accumulated attribute with examples
+#[derive(Debug, Clone)]
+struct AccumulatedAttribute {


Could we accumulate right into the semconv structs and avoid these additional structs and the two stage process?

jerbly

I have made a few comments:

some are to tidy the code which you can treat as nits
handling type mismatch and missing essential data I think needs to be addressed
optimizing with a single pass to accumulate and translate could be fun, not essential

I think, if we're not supporting v2 in this PR that's ok (it's marked experimental) but we should quickly move on to that in a follow-up. I'd also suggest, in the next PR, we move the main conversion code out to either one of the existing crates or a new one.

FYI. I've been asked for this infer tool a few times now so it's great to see it coming together. Thanks!

ArthurSens · 2026-02-04T19:06:38Z

optimizing with a single pass to accumulate and translate could be fun, not essential

I think I made it work for attributes at least, but I'm struggling a bit to make it work for metrics, spans and events. The hashmap is useful for quick lookups, and I'm not sure how to do the deduplication without the hashmaps 😬

I think, if we're not supporting v2 in this PR that's ok (it's marked experimental) but we should quickly move on to that in a follow-up. I'd also suggest, in the next PR, we move the main conversion code out to either one of the existing crates or a new one.

Happy to tackle both!

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Serde should be able to handle the YAML serialization Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

src/registry/infer.rs

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

jerbly

Overall: Looks good for a first pass.

Perhaps when adding the v2 support there can be some refactoring to make this more idiomatic Rust. The conversion logic between Sample* types and Accumulated*/AttributeSpec types could use Rust's conversion traits:

From<&SampleAttribute> for AttributeSpec - Replace attribute_spec_from_sample() with a From impl
From<&AccumulatedSpan> for GroupSpec (and similar for Metric/Event) - Replace the inline conversion in to_semconv_spec()

Maybe add an Accumulate trait - Something like:

trait Accumulate {
    fn accumulate(&self, acc: &mut AccumulatedSamples);
}

Implement for SampleResource, SampleSpan, SampleMetric, etc. This would let add_sample become simply sample.accumulate(self).

But, this is a great addition to weaver, let's get the first iteration in.

jerbly · 2026-02-07T18:44:14Z

src/registry/infer.rs

+                if let Some(resource) = resource_log.resource {
+                    let mut sample_resource = SampleResource {
+                        attributes: Vec::new(),
+                        live_check_result: None,
+                    };
+                    for attribute in resource.attributes {
+                        sample_resource
+                            .attributes
+                            .push(sample_attribute_from_key_value(&attribute));
+                    }
+                    accumulator.add_sample(Sample::Resource(sample_resource));
+                }


nit: this resource accumulation block is repeated for each signal. Maybe we can be more DRY here?

github-advanced-security bot found potential problems Jan 14, 2026

View reviewed changes

ArthurSens commented Jan 14, 2026

View reviewed changes

nicolastakashi reviewed Jan 16, 2026

View reviewed changes

src/registry/infer.rs Outdated Show resolved Hide resolved

nicolastakashi reviewed Jan 16, 2026

View reviewed changes

jerbly reviewed Jan 17, 2026

View reviewed changes

ArthurSens marked this pull request as ready for review January 21, 2026 21:08

ArthurSens requested a review from a team as a code owner January 21, 2026 21:08

github-advanced-security bot found potential problems Jan 21, 2026

View reviewed changes

jerbly reviewed Jan 30, 2026

View reviewed changes

ArthurSens added 3 commits February 4, 2026 16:08

Implement weaver registry infer command

e3fba3a

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

cargo fmt

bbbcceb

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Replace to_string with to_owned in string literals

4f32c48

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

ArthurSens added 11 commits February 4, 2026 16:08

Replace YamlGroup with semconv::GroupSpec

e9beff4

Serde should be able to handle the YAML serialization Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Add tests for implemented functionality

60b43eb

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

cargo fmt

f6b6c48

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Add TODO comment about span event deprecation

0131e57

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Log if trying to add unsupported Sample type

9c887c4

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Fix clippy findings

ff9b110

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Add warning about stability of infer command

9fda1d4

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Remove optional from accumulated metric instrument

f7cc275

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Refactor Examples handling

9b4fb82

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Fix sanitization

6bd1ece

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

lint and fmt

94161ee

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

ArthurSens force-pushed the weaver-registry-infer branch from 9823e80 to 94161ee Compare February 4, 2026 19:13

github-advanced-security bot found potential problems Feb 4, 2026

View reviewed changes

src/registry/infer.rs Fixed Show fixed Hide fixed

src/registry/infer.rs Fixed Show fixed Hide fixed

More clippy warning fixes

03d0ad0

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

jerbly approved these changes Feb 7, 2026

View reviewed changes

Conversation

ArthurSens commented Jan 14, 2026

TLDR

Description

Testing

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurSens commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerbly Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerbly commented Jan 17, 2026

Uh oh!

ArthurSens commented Jan 19, 2026

Uh oh!

jerbly commented Jan 20, 2026

Uh oh!

ArthurSens commented Jan 20, 2026

Uh oh!

jerbly commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurSens commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerbly Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerbly Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ArthurSens commented Jan 14, 2026 •

edited

Loading

codecov bot commented Jan 14, 2026 •

edited

Loading

jerbly Jan 17, 2026 •

edited

Loading

ArthurSens commented Jan 21, 2026 •

edited

Loading

jerbly Jan 30, 2026 •

edited

Loading

jerbly Jan 30, 2026 •

edited

Loading