-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add template converter #8507
Add template converter #8507
Conversation
Codecov ReportAttention:
... and 1 file with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
adc4593
to
15d3ac7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this feature interact with the existing ${...}
syntax? For example, is this supported? If not, what kind of error do we output?
receivers:
template/my_contrived_example:
my_env_var_name: ENV_VAR_NAME
processors:
batch:
exporters:
otlp:
endpoint: localhost:4317
service:
pipelines:
logs:
receivers: [template/my_filelog_template]
processors: [batch]
exporters: [otlp]
templates:
receivers:
my_contrived_example: |
receivers:
contrived_receiver:
setting: ${env:{{ .my_env_var_name }}}
I am not claiming that this should be supported (sounds like a contrived edge case better addressed through different means), just that we should error out with a clear error if not supported and document the limitations
I actually think this would potentially be quite useful. In my mind, templates should be expanded first, but from there the content should be a valid config as we would otherwise define it. That said, I misunderstood where expansion is handled, thinking it was within the expand converter. It appears we have a sort of hardcoded psuedo-converter block here, and then immediately call the other converters. I think if we move the expansion code into a converter, we could handle this correctly. Currently, it looks like we currently get an ugly error like this: If we really don't care to support this, it's simple enough to add some clearer context to the error message: if err != nil {
if strings.HasPrefix(k, "templates::") {
return nil, fmt.Errorf("cannot expand value in template %q: %w", k, err)
}
return nil, err
} I'll play around with my idea and provide an update as soon as possible. |
Looking into this more, I think this is going to be quite difficult to support. First of all, I think it's desirable that If template expansion happens after Intuitively, I think it might be possible to recursively alternate between the two types of expansions, but I can't quite see how to manage this in a reasonable way. It might have to be error driven. For now, I think it probably just makes sense to prohibit the use of |
@open-telemetry/collector-maintainers, any thoughts on this feature? |
} | ||
|
||
if len(cfg.Receivers) == 0 { | ||
return nil, errors.New("must have at least one receiver") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation behind requiring a receiver in a template? In my opinion it would be valuable to have templates further down in a pipeline. Two examples I can think of are:
- A template that includes config for a processor or set of processors that transform data in a certain fashion.
- Vendors could offer templates that preconfigure processors and exporters that simplify the configuration required for exporting to their backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree with the other use cases as well. I would describe your example (1) as a "processor template", and (2) as an "exporter template", and this PR as supporting "receiver templates".
I think it's really important to look at templates from two distinct points of view:
- A novice user who benefits from not knowing the details of the template.
- An experienced user who captures and shares domain knowledge by creating a template.
For the novice user, the purpose of a template is to be an abstraction. In order to do this effectively, I think a template needs to feel as much as possible like a regular component, meaning it has both a class (i.e. receiver, processor, exporter) and type (e.g. filelog, my_template), and its use in the configuration is almost exactly like any other component. If they can think of it as a receiver, then it's much clearer where to define an instance (in the receivers
section) and where to use it (in a pipeline's receivers
section).
For the experienced users who create templates, and for us as maintainers, there are other reasons to constrain each template to a particular class. There are several natural differences which I believe would be very difficult to manage otherwise:
- Requirements for what is optional and required within the template. (A receiver template must contain a receiver because otherwise using the template in place of a receiver would have no effect.)
- Rules for how a template instance is defined (in the
receivers
section) and how it must be used in the service (in a pipeline'sreceivers
section). - Logic for how the template is expanded and how it hooks into the global component graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really insightful answer, thanks for the details. I really like the idea of ascribing a class to templates, I agree with all your points on how it would simplify both the validation and usage of a template. I'm seeing now that classifying templates is already present in the existing implementation, I've left a few comments/questions around the implementation side of that.
// Add a dedicated forward connector for this template instance. | ||
// This will consume data from the partial pipelines and emit to | ||
// any top-level pipelines that used the template as a receiver. | ||
connectorID := id.withPrefix("forward") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have the ability to validate that the forward connector is available here? I realize it's outside the responsibility of a converter to validate the config this way, but I worry about the case where a user of a custom distribution that doesn't include the forward connector attempts to use a template and receives an error saying that the forward connector isn't a known component when they never specified it in any of their configs.
Alternatively, could we check if it's available somewhere in otelcol
if a template is provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. I haven't found a good way to do this. Any more specific ideas?
Generally I agree it would be nice to have a better error message but maybe documentation would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will have to just be documentation for now. Any code aware of the available components only deals with a resolved config that has already had the templates
section removed, so there's no reliable way to determine if templates were used.
|
||
// Special case where the template only contains receivers. In this case, | ||
// we can just substitute the rendered receivers in place of the template ID. | ||
if len(cfg.Processors) == 0 && len(cfg.Pipelines) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because I haven't seen it mentioned anywhere else, how do we want to handle connectors? Does it make sense to declare a connector template, or a receiver template that converts the input signal to another signal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a particular design in mind for connector templates, or for using connectors in other classes of templates. My intuition is that it may make sense eventually but I think someone will have to propose a detailed set of rules for how this would work. I'd rather leave this for later unless we think there's a risk of designing ourselves into a corner by not solving it now.
Purely as speculation, I think would could eventually allow connectors within receiver templates by introducing a special key (maybe just forward
) which MUST be used at least once as an exporter and MAY NOT be used as a receiver in the template. Basically, we'd relax the constraints on what can be used within a receiver template and simply require that the user indicate how data flows out of it. (Again, I'd rather leave this for later, but I'm including this to illustrate that it should be possible to extend the framework if necessary.)
templates:
receivers:
my_receiver_template: |
receivers:
jaeger: ...
connectors:
spanmetrics: ...
pipelines:
traces:
receivers: [ jaeger ]
exporters: [ spanmetrics ] # current proposal does not allow exporters but we could relax the rule
metrics:
receivers: [ spanmetrics ]
exporters: [ forward ] # "forward" is automatically defined and must be used at least once as an exporter
|
||
tmpl, ok := c.receiverTemplates[id.tmplType] | ||
if !ok { | ||
return fmt.Errorf("template type %q not found", id.tmplType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how we could do this without the logic getting ugly, but I think we should check that a user doesn't attempt to use a template of one class in the config section for another class. With what's here, if I tried to use a processor template in the receivers section, it would tell me that the template wasn't found even if I defined it, which I think would be a confusing error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll be easy enough to based on the structure I've proposed for defining templates:
templates:
receivers:
my_receiver_template: ...
Since we know the context in which the user is referencing the template, and we know the class of template based on where it was defined, it should be very easy to cross-reference.
That said, I don't think there is anything to do about it until we add support for an additional class of templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have worded that better. I'm wondering if since multiple component classes may be involved in a single receiver class template, it may make sense to validate that users don't misunderstand the template class and attempt to use a receiver template as anything but a receiver in the top-level config or within a pipeline. The error messages don't offer much guidance:
- Attempting to use a receiver template in the config's
processors
section:
error decoding 'processors': unknown type: "template" for id: "template/my_filelog_template"
- Attempting to use a receiver template in the
processors
list in a pipeline:
service::pipelines::logs: references processor "template/my_filelog_template" which is not configured
I suppose the behavior I think would be the most user-friendly here would be to let the user know they're using a receiver template in the wrong place if the template is used in a non-receiver location and, in the future, if a template with the same type and name hasn't been defined for another class. It's possible this could be handled elsewhere in the Collector's config parsing, but I think the template converter will have the most context here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it seems like these are the same errors one would get if they try to define or use any receiver in a processors
section. e.g.
error decoding 'processors': unknown type: "filelog" for id: "filelog/foo"
service::pipelines::logs: references processor "filelog/foo" which is not configured
In my opinion this isn't really a special case that requires different handling. It really only becomes confusing if we get away from the notion that receiver templates are receivers. If that's happening a lot, then I think we've failed to create an abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should avoid adding special handling here where possible. It does get a bit weird here though since the config unmarshaler doesn't know about templates. As a novice user, it wouldn't be clear to me why template
isn't a valid value in the full error message for using the receiver template in the top-level processors
section:
unknown type: "template" for id: "template/my_filelog_template" (valid values: [batch memory_limiter])
All that said, I'm remembering this is behind a feature gate, so I'm okay with handling it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurred to me while playing around with this that we have an implicit dependency on otelcol.Config
here. Everything up to the point where we're running converters on a confmap is only dealing with maps without any assumptions about their structure, but to render the templated components we need to understand the structure of a config. Should we consider adding a new stage of config processing that processes the config after we have a semantic understanding of the config and expand templates there, or is this an acceptable trade-off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really interesting idea and I spent some time trying to make it work. Unfortunately, it appears to lead us to a chicken/egg scenario.
It's certainly much easier and more concise to work with the otelcol.Config
struct directly. However, the problem is that by the time we have this struct, we've already missed our chance to properly unmarshal any components which were contained in templates.
Basically, this is what happens when converting each component from confmap.Conf
to component.Config
:
- Look up the type and make sure there is a factory for it. (I had to add a carve out for
if type == "template"
to make this work.) - Use the factory to instantiate a default config.
- Unmarshal the
confmap.Conf
into the new instance of the default config.
After doing this for each component, we eventually have an otelcol.Config
, but now if we expand the templates we end up adding new components which were not unmarshaled in the same way.
Technically, we could do steps 1-3 again for only these components but it looks like a substantial overhaul to separate this functionality from the unmarshaler. Or, if we had a way to convert otelcol.Config
back to confmap.Conf
, we could just unmarshal the whole thing again. Neither of these seem like good options and I'm not currently seeing any other ideas on how to approach this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, we could do steps 1-3 again for only these components but it looks like a substantial overhaul to separate this functionality from the unmarshaler.
That was more or less the approach I had in mind. I agree it would probably require a substantial overhaul, but I thought about it some more after reading your summary and I still think it is possibly worth the effort. I've attempted to draft a solution that would solve all our issues with minimal downsides, but let me know if there's some fatal flaw that I've missed.
I think the problems with implementing templating within our current unmarshaling facilities are that they expect a one-to-one mapping between component configs and factories, and that all component unmarshaling is done within a section of the config instead of in the scope of the whole config.
We would probably still want to take advantage of configunmarshaler.Configs
to unmarshal individual components, but would want to process the whole config after that to do template expansion. I would see a "config processor" being something similar to a converter that implements the following interface. A list of these would be registered with otelcol.ConfigProvider
and would be used during config unmarshaling.
type ConfigProcessor interface {
// Prefix for configs like "template/"
Prefix() string
// Config type for component configs that need
// processing like `template/my_receiver`
ComponentConfig() any
// Key and type for the processor's config (e.g. "templates" and `templatesConfig`)
ProcessorConfig() (string, any)
// Handles configs starting with the processor's prefix
// given the available factories and its config.
// All instances of the ComponentConfig type should be
// replaced by component.Config when this function returns.
Process(pc *PartialConfig, factories Factories, config any) error
}
We could then split config unmarshaling into stages:
First, unmarshal the config as much as possible, considering during unmarshaling that there may be some keys that need further processing. So we could unmarshal components, but any config keys prepended with template/
would need to be handled by a processor. I think this first step would produce a struct like:
type PartialConfig struct {
Receivers: map[string]any // Factory or types returned by ConfigProcessor.ComponentConfig()
[...]
Service: PartialService // Some type that enables accessing/creating pipelines for the case of templating
}
We would then run the Process
method of each ConfigProcessor
on the PartialConfig
, with the expectation that after all processors have run, we're left with map[string]component.Config
for each component class. I think the template converter would run in a similar fashion to how it runs now with this: it would remove the component template from the config, then add all components and add them to the pipelines.
I didn't specify a mechanism here, but at least for templates we would also want to provide a common function to take a confmap.Conf
and return a component.Config
. Component config unmarshaling would still happen at two different steps, but would at least be done consistently.
I think this design would also provide a straightforward path for open-telemetry/opentelemetry-collector-contrib#18509, which would probably work similar to templates but be slightly simpler.
I see a few downsides to this route:
- We currently only have two use-cases in mind, which I'm not sure is enough to say that this design will serve all our needs going forward or will be used enough to justify the additional complexity.
- Overall I think this will be cleaner and for templating will enable things like checking whether the forward connector is available. However, at least for our current use-cases, this doesn't enable anything that we can't already do with a Converter.
- We would need to meddle with
service.Config
(or produce it from another struct), for which the whole point is to be directly unmarshaled from aconfmap.Conf
. We'd likely have to refactor something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're proposing probably makes sense. Unfortunately, I'm finding myself too short on time to spearhead this overhaul. If you have a clear idea of what should change and have the time, I can make time in the next few weeks to review PRs. Either way, I will hopefully pick this back up later in November.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this ConfigProcessor
interface, it would not be enough for templates that provide pipelines of receivers and processors, right?
In #8372, the example includes a template with receivers and processors, and is expected that the processors are expanded as part of the pipeline of the receiver.
Following the approach of the ConfigProcessor
, Process
would need to receive the whole config, or some interface to append components beyond the PartialConfig
.
10283de
to
443372d
Compare
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
I was reviewing the PR as being something that it would really help both new and advanced users simplify their configurations. While I really like the concept of the “receiver” template for its potential to abstract numerous details for the user, I am curious about how the structure of a “processor” or “exporter” template would look like. The current receiver template can expand processors and exporters even if they are not defined in its configuration, as they are later used as if they were a receiver in the pipeline (forward connector). This supports the rationale for having different template classes. My main concern would be on finding a common pattern between the template classes, I think it would be very helpful to have processors and exporters templates as well, but, would they be able to define other components too? How would the pipelines be constructed in these cases? Although I still think it is worth researching this approach, I was thinking of another alternative that uses the template as a straightforward 1:1 component expander, no additional logic over the component's usage. The template would define various components, but only those referenced by the user would be expanded. In this scenario, the template would work the same way for receivers, processors and exporters. I would rather call it raw template converter, as it would not add any logic (no dependency on the forward connector) over the components pipelines, it merely templates components. Example configuration: receivers:
template/my_filelog_template:
my_foo_file: foo.log
my_bar_file: bar.log
template/my_hostmetrics_template:
processors:
template/my_filelog_template:
template/my_hostmetrics_template:
exporters:
template/my_filelog_template:
domain: localhost
template/my_otlp_template:
apm_endpoint: apm.endpoint
apm_token: hello
service:
pipelines:
logs/my_logs_from_template_components:
receivers: [template/my_filelog_template]
processors: [template/my_filelog_template]
exporters: [template/my_filelog_template]
metrics/hostmetrics_otlp_templates:
receivers: [template/my_hostmetrics_template]
processors: [template/my_hostmetrics_template]
exporters: [template/my_otlp_template]
templates:
my_filelog_template:
receivers: |
filelog/foo:
include: {{ .my_foo_file }}
filelog/bar:
include: {{ .my_bar_file }}
processors: |
batch/1:
batch/2:
exporters: |
otlp:
endpoint: {{ .domain }}:4317
compression: none
my_hostmetrics_template:
receivers: |
hostmetrics:
collection_interval: 20s
scrapers:
cpu:
{{- if .ExtendedCPU }}
metrics:
system.cpu.utilization:
enabled: true
system.cpu.physical.count:
enabled: true
system.cpu.logical.count:
enabled: true
system.cpu.frequency:
enabled: true
{{- end }}
load:
processes:
processors: |
metricstransform:
resourcedetection:
detectors: ["env", "system"]
system:
hostname_sources: ["os"]
resource_attributes:
host.id:
enabled: true
my_otlp_template:
exporters: |
debug:
verbosity: detailed
otlp/apm:
endpoint: {{ .apm_endpoint }}
compression: none
headers:
Authorization: "Bearer {{ .apm_token }}" Effective config receivers:
filelog/bar:
include: bar.log
filelog/foo:
include: foo.log
hostmetrics:
collection_interval: 20s
scrapers:
cpu:
load:
processes:
processors:
batch/1:
batch/2:
metricstransform:
resourcedetection:
detectors: ["env", "system"]
system:
hostname_sources: ["os"]
resource_attributes:
host.id:
enabled: true
exporters:
debug:
verbosity: detailed
otlp/apm:
endpoint: apm.endpoint
compression: none
headers:
Authorization: "Bearer hello"
otlp:
endpoint: localhost:4317
compression: none
service:
pipelines:
logs/my_logs_from_template_components:
receivers: [filelog/bar, filelog/foo]
processors: [batch/1, batch/2]
exporters: [otlp]
metrics/hostmetrics_otlp_templates:
receivers: [hostmetrics]
processors: [metricstransform, resourcedetection]
exporters: [debug, otlp/apm] Note that not even a prefix is added to the components, the components name would rely on the template creators (we could use the component name if defined). What do you think about this idea? Is there any related research? Although, I like the freedom that it gives to the end user about the component's usage (e.g. use a subset of components defined in the template) and templates features, I am also afraid it could add another layer of complexity to the configuration. |
Resolves #8372
Alternative to #8504
This approach uses only a template converter. This means that any provider may provide templates as part of the raw configuration.
In the simplest case, this looks like this, where the config and a template are defined in the same file:
Note that this it is not really the intention of templates. Typically, the
templates
section would be provided outside of the main config.Additional notes:
templates
section is deleted by the converter after expanding any templates, similar to how environment variable expansion deletes the original reference to the env var. So although it appears to be part of the config, it is should be considered a precursor to the actual collector configuration.