Implement weaver registry infer command#1138
Implement weaver registry infer command#1138ArthurSens wants to merge 15 commits intoopen-telemetry:mainfrom
weaver registry infer command#1138Conversation
There was a problem hiding this comment.
clippy found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
|
Opening as a draft first, manually tested and it seemed to work :) Some question that I have:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1138 +/- ##
=====================================
Coverage 80.0% 80.0%
=====================================
Files 109 109
Lines 8528 8528
=====================================
Hits 6823 6823
Misses 1705 1705 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| let entry = self | ||
| .spans | ||
| .entry(span.name.clone()) | ||
| .or_insert_with(|| AccumulatedSpan::new(span.name.clone(), span.kind.clone())); |
There was a problem hiding this comment.
Some extra thoughts here: When the same span name is received multiple times with different kind values, only the first kind is preserved.
Not sure what to do to be honest, should I use more than just span/metric/event/resource name as the identifier? Maybe use all the fields that identify a particular telemetry type?
There was a problem hiding this comment.
There currently is no identifier for a span so anything done here is guessing. Live-check cannot do span comparisons at the moment because of this too. It just looks at the attributes within.
| } | ||
|
|
||
| fn add_metric(&mut self, metric: SampleMetric) { | ||
| let instrument = match &metric.instrument { |
There was a problem hiding this comment.
What does it means, sorry maybe I'm not familiar with the concept of metric.instrument, is this the metric type?
There was a problem hiding this comment.
Kind of, it's gauge, updowncounter, histogram...
There was a problem hiding this comment.
Being a bit more specific for the discussion we had in our call @nicolastakashi, summaries are indeed deprecated in the OTLP proto, and Weaver is very explicit about not supporting it:
weaver/src/registry/otlp/conversion.rs
Line 128 in b55473e
| attributes, | ||
| }); | ||
|
|
||
| // Span events as separate event groups |
There was a problem hiding this comment.
Just some comments from the SIG meeting today:
both span and log events can be mapped to events. Eventually, span events will be deprecated and we can remove this functionality from infer and add a warning in live check if they are present
Overall I'm wondering what the intent of this command is. What you have made takes samples and aggregates them into entirely new definitions, I guess to use as a starting point model? What I had in mind would have been more embedded in live-check, maybe a Also, live-check would be highlighting any items which would be troublesome to make an inference for: e.g. an attribute named |
Exactly. While giving talks about Weaver last year, a very common question was: "I have thousands of metrics already, I don't want to manually rewrite what I have into a schema. Is there anything to make this easier?". That's the problem I'm trying to solve here. As long as there's an appropriate receiver in the collector, you can send data in any format, translate to OTLP, send it to Weaver Infer and you'll have your OTel Schema available. It's up to you to do further modifications to the schema as needed. With a schema available, code generation could build dashboards, could generate instrumentation code that helps migrate from one SDK to another, etc etc. To be honest, I'm even envisioning a combined functionality of weaver serve+infer, where inferred schemas could be modified through the UI before the user "commits" them to the registry.
Interesting! This hasn't crossed my mind at all before. Could you elaborate a bit on the use case for this? What are the problems you wanted to solve? |
If you run live-check today with an empty registry it will produce an output with every sample and, where possible, it will tell you every attribute and signal is missing in the live_check_result for that sample. You could imagine taking the json report from this live-check and producing an inferred registry like you've done with your code. Now extend this concept. Rather than starting with an empty registry, start with the OTel semconv registry. The output report can now be interpreted to infer either modifications to the registry, or extensions to it in a child registry. At my company we have a company-registry which is dependent on the OTel registry. We often find attributes and signals we want to express that fit in the OTel namespaces for example As another example, you produced a registry in your PR: prometheus/prometheus#17868 - moving forward, you could run the live-check inference again with this registry loaded and infer modifications to it alongside live-check telling you what's missing or invalid. |
So with your idea, if we add a I can work with that :)
Hmmm, I think I understand some parts but others I'm still feeling a bit lost.
|
No, I'm doing a bad job trying to explain this I think.
I'm thinking the command could be:
This use case could be a later phase. I think we would need options to determine if weaver should add or overwrite when it finds differences. And, if you want weaver to remove definitions if they were not received in the samples.
|
|
Ok, I think I've addressed all comments that are addressable, given what we discussed in the SIG meeting today. I'm intentionally letting some things undone to keep the scope of the PR small and easier to review:
But please let me know if any of the above should be worked on in this PR, and if there's anything else you'd like to see here |
src/registry/infer.rs
Outdated
| let attr_entry = entry | ||
| .attributes | ||
| .entry(attr.name.clone()) | ||
| .or_insert_with(|| { | ||
| AccumulatedAttribute::new(attr.name.clone(), attr.r#type.clone()) | ||
| }); | ||
| attr_entry.add_example(&attr.value); |
There was a problem hiding this comment.
Let's say I have messy real world telemetry. I may receive attr=42 and then in another sample attr="hello". This would make an attr of type int with examples [42,"hello"].
How should mismatched types be handled?
There was a problem hiding this comment.
Good catch, I didn't think of this. I'm updating the PR to ignore attribute values that differ from the original value received.
This will probably create some race conditions, but not sure what else could be done here 🤔
src/registry/infer.rs
Outdated
| #[derive(Debug, Clone)] | ||
| struct AccumulatedAttribute { | ||
| name: String, | ||
| attr_type: Option<PrimitiveOrArrayTypeSpec>, |
There was a problem hiding this comment.
Does this need to be optional? Attributes must have a type.
src/registry/infer.rs
Outdated
| #[derive(Debug, Clone)] | ||
| struct AccumulatedMetric { | ||
| name: String, | ||
| instrument: Option<InstrumentSpec>, |
src/registry/infer.rs
Outdated
| fn add_metric(&mut self, metric: SampleMetric) { | ||
| let instrument = match &metric.instrument { | ||
| SampleInstrument::Supported(i) => Some(i.clone()), | ||
| SampleInstrument::Unsupported(_) => None, |
There was a problem hiding this comment.
If we receive and unsupported instrument, we can't infer a semconv for it by definition. So we should reject the sample as not inferable.
There was a problem hiding this comment.
Ok gotcha, that makes sense. That allows removing the Option from instrument as well then
src/registry/infer.rs
Outdated
| let attr_type = attr | ||
| .attr_type | ||
| .clone() | ||
| .unwrap_or(PrimitiveOrArrayTypeSpec::String); |
There was a problem hiding this comment.
As mentioned in an earlier comment. IMO we should not need to have an Optional type and therefore this goes away. I'm guessing this optionality comes from Sample* where we allow missing type or value. This makes sense for live-check, we can just compare an attribute name on its own for validity. It doesn't make sense for infer, this data is mandatory to make a semconv definition.
I would recommend removing the Options where data is mandatory and rejecting samples with None types rather than carrying the Option all through the code.
src/registry/infer.rs
Outdated
| /// Convert a vector of JSON values to the appropriate Examples type. | ||
| /// | ||
| /// Uses serde to automatically match the JSON values to the correct Examples variant. | ||
| fn json_values_to_examples(values: &[Value]) -> Option<Examples> { |
There was a problem hiding this comment.
I mentioned this above too. All examples must be the same type and must match the type of the attribute. If you've determined int they must all be int. (Of course one strategy to handle mismatching types is to change the type to Any and then you can have Any examples).
You could build the Examples as they arrive in the samples rather than this two stage process. Perhaps a function add_example(&value, &examples) -> Result<Examples, Error> where it will make a new Examples given the current examples and value (perhaps changing a single example to an array).
src/registry/infer.rs
Outdated
| } | ||
|
|
||
| fn sanitize_id(name: &str) -> String { | ||
| name.replace(['/', ' ', '-', '.'], "_") |
There was a problem hiding this comment.
Do we want to convert . to _? The namespace separator in OTel is .
There was a problem hiding this comment.
whoops, force of habit 😅
src/registry/infer.rs
Outdated
|
|
||
| fn sanitize_id(name: &str) -> String { | ||
| name.replace(['/', ' ', '-', '.'], "_") | ||
| .to_lowercase() |
There was a problem hiding this comment.
Perhaps there's a way to use the convert_case crate to snake_case first and then deal with invalid chars. This would convert HelloWorld to hello_world too which is preferable to helloworld.
src/registry/infer.rs
Outdated
|
|
||
| /// Accumulated attribute with examples | ||
| #[derive(Debug, Clone)] | ||
| struct AccumulatedAttribute { |
There was a problem hiding this comment.
Could we accumulate right into the semconv structs and avoid these additional structs and the two stage process?
jerbly
left a comment
There was a problem hiding this comment.
I have made a few comments:
- some are to tidy the code which you can treat as nits
- handling type mismatch and missing essential data I think needs to be addressed
- optimizing with a single pass to accumulate and translate could be fun, not essential
I think, if we're not supporting v2 in this PR that's ok (it's marked experimental) but we should quickly move on to that in a follow-up. I'd also suggest, in the next PR, we move the main conversion code out to either one of the existing crates or a new one.
FYI. I've been asked for this infer tool a few times now so it's great to see it coming together. Thanks!
I think I made it work for attributes at least, but I'm struggling a bit to make it work for metrics, spans and events. The hashmap is useful for quick lookups, and I'm not sure how to do the deduplication without the hashmaps 😬
Happy to tackle both! |
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Serde should be able to handle the YAML serialization Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
9823e80 to
94161ee
Compare
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
jerbly
left a comment
There was a problem hiding this comment.
Overall: Looks good for a first pass.
Perhaps when adding the v2 support there can be some refactoring to make this more idiomatic Rust. The conversion logic between Sample* types and Accumulated*/AttributeSpec types could use Rust's conversion traits:
From<&SampleAttribute> for AttributeSpec- Replaceattribute_spec_from_sample()with aFromimplFrom<&AccumulatedSpan> for GroupSpec(and similar for Metric/Event) - Replace the inline conversion into_semconv_spec()
Maybe add an Accumulate trait - Something like:
trait Accumulate {
fn accumulate(&self, acc: &mut AccumulatedSamples);
}Implement for SampleResource, SampleSpan, SampleMetric, etc. This would let add_sample become simply sample.accumulate(self).
But, this is a great addition to weaver, let's get the first iteration in.
| if let Some(resource) = resource_log.resource { | ||
| let mut sample_resource = SampleResource { | ||
| attributes: Vec::new(), | ||
| live_check_result: None, | ||
| }; | ||
| for attribute in resource.attributes { | ||
| sample_resource | ||
| .attributes | ||
| .push(sample_attribute_from_key_value(&attribute)); | ||
| } | ||
| accumulator.add_sample(Sample::Resource(sample_resource)); | ||
| } |
There was a problem hiding this comment.
nit: this resource accumulation block is repeated for each signal. Maybe we can be more DRY here?
TLDR
Implements weaver registry infer command that generates a semantic convention registry YAML file by inferring the schema from incoming OTLP telemetry data.
Description
This PR adds a new weaver registry infer subcommand that starts a gRPC server to receive OTLP messages (traces, metrics, logs) and automatically infers a semantic convention schema from the observed telemetry. The command processes incoming data, deduplicates attributes across signals, and collects up to 5 unique example values per attribute to help document the inferred schema.
The inferred schema is written to a single registry.yaml file in the specified output directory (default: ./inferred-registry/). The output follows the standard semantic convention format with separate groups for resources, spans, metrics, and events. Resource attributes are currently accumulated into a single resource group; entity-based grouping (via OTLP EntityRef) is not yet supported but documented for future implementation.
Testing
Tested by using weaver registry emit to send OTLP telemetry to the infer command's gRPC endpoint. The generated registry.yaml file was verified to contain the expected groups (resources, spans, metrics, events) with properly inferred attribute types and example values.