-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplifying stateful behavior #35013
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
any thoughts over this @djaglowski @atoulme ? |
I think this is an issue which could be considered by the core maintainers. I will note that at one point the filelog receiver would automatically use a storage extension if exactly one were present, but this caused problems, e.g. adding a second storage extension would invalidate the config for an unrelated component. |
I see. @open-telemetry/collector-maintainers can you take a look here and share your thoughts?
Would you happen to remember when this happened and around which collector version it was? I'll look it up. |
@open-telemetry/collector-maintainers @open-telemetry/collector-contrib-maintainers can you take a look here and share your thoughts? |
Would having something like the template provider described here open-telemetry/opentelemetry-collector#8372 help? I would prefer to go with a more generic solution like this rather than a one-off thing if possible |
@mx-psi Looking at the issue, it would certainly help. Is that being actively worked upon? |
I don't think so (Dan can correct me if I am wrong :)) |
I'm not actively working on the template provider/converter because there was no clear path forward (that I could understand). |
IMO, the fundamental problem here is that users are required to learn the technical implementation details of particular receivers to be able to use them in the optimal way. This makes successfully using the collector harder than it needs to be. In the case of filelog, by default it is going to re-ingest the contents of every file it is able to read whenever the collector restarts. Users discover this by observing duplicated log lines, and search for a way to stop it. This leads them to discovering that log collection requires check pointing, discovering the storage extension, deciding the best place to store the check points for every deployment, and then configuring it for every filelog receiver they have. It would be vastly simpler if filelog were able to persist state by default as it did before and people were able to opt out of it when necessary and they were willing to accept the consequences. This is the "works optimally by default" configuration choice. I also suspect that filelog is not or will not be the only receiver that suffers from this problem, log collection is not the only use case that benefits from being stateful by default. I'm not sure templates help this, you might end up with many templates that include a filelog receiver wanting to depend on a not yet declared storage extension or duplicating storage extensions. It also wouldn't help users who don't use templates (if they one day exist as proposed). The official Helm chart does a reasonable job of trying to simplify this via the storeCheckpoints option, but if you don't start there or aren't on Kubernetes this doesn't help you. |
I agree it would be nice if state were persisted by default and that templates do not solve the problem. Ultimately though, in order to add such a capability, someone needs to articulate a design that convincingly accounts for assumptions that must be made. In my opinion, implicit behaviors can easily make things worse for users if the rely on poor assumptions and/or introduce tricky edge cases. This is not a comprehensive list but I think we'd need to pin down the following:
Regarding the filelog receiver in particular, I must correct a couple things:
This is not the default behavior. It depends on
To be clear, persistence was never enabled by default. It would automatically use a storage extension if exactly one was configured, but this caused other problems. e.g. Adding a storage extension for the sake of another component would break the filelog reciever without touching it. |
Thanks for correcting this, I am still building up context on the entire story here. Just to be explicit on what the tradeoff with For this issue I am starting from these two principles:
You could solve 1 by having an implicit default storage extension that exists at a default directory for each operating system. This does nothing for 2 though, and the questions below clearly highlight the usability problems with this. Particularly the case where a user removes their manually linked storage extension and implicitly falls back to the default one is not very nice.
I think we could get away from these problems by having a higher level configuration concept for whether the collector should have persistent storage by default. Let's use the fault tolerant log collection example configuration as a starting point: receivers:
filelog:
include: [/var/log/busybox/simple.log]
storage: file_storage/filelogreceiver
extensions:
file_storage/filelogreceiver:
directory: /var/lib/otelcol/file_storage/receiver
file_storage/otlpoutput:
directory: /var/lib/otelcol/file_storage/output
service:
extensions: [file_storage/filelogreceiver, file_storage/otlpoutput]
pipelines:
logs:
receivers: [filelog]
exporters: [otlp/custom]
processors: []
exporters:
otlp/custom:
endpoint: http://0.0.0.0:4242
sending_queue:
storage: file_storage/otlpoutput If the concept of storage was lifted out of the extension as a higher level property of the collector, this would simplify to: receivers:
filelog:
include: [/var/log/busybox/simple.log]
# --- NEW ---
storage:
enabled: true
path: /var/lib/otelcol/file_storage
create_directory: true
# -----------
service:
pipelines:
logs:
receivers: [filelog]
exporters: [otlp/custom]
processors: []
exporters:
otlp/custom:
endpoint: http://0.0.0.0:4242 Doing this removes the need for users to know which receivers, processors, and exporters benefit from storage. They only need to decide whether to support persistent storage or not. Coming back to @djaglowski's questions:
This proposal is an explicit declaration that the collector should have storage available for every component that benefits from it. There is an explicit definition of the location, and it can easily be overridden or turned off.
Yes, especially if
Either
There would be a standard default directory per OS (TBD), and the simplest behavior is to attempt to create the directory if it does not exist, with the ability to turn this behavior off. The underlying building block could still be the If there is a better place for this concept to live in the configuration, feel free to suggest it. I am not attached to the particular configuration location or name, more the simplification of how to make stateful components work correctly out of the box. At an even higher level, I would like it if the collector only duplicated or lost logs if a user explicitly opted into that instead of being surprised by it. |
@cmacknz I find your approach very compelling. I agree that internally it should use I’d love to get feedback from the @open-telemetry/collector-maintainers and @open-telemetry/collector-contrib-maintainers on this.
I think it should function without requiring an explicit declaration for |
Thinking about this a bit more, a less drastic change to the configuration would be model the same functional behavior as a "default storage" extension or perhaps a "directory storage" extension. If present it would give components that naturally want to store state on disk a place to put it automatically (subject to reasonable ways to avoid having them collide). That would make it opt in and give users a way to get out of having to manually wire up storage extensions everywhere when it is used. |
Component(s)
extension/storage, extension/storage/filestorage, receiver/filelog
Is your feature request related to a problem? Please describe.
Currently, the collector operates in a stateless mode by default, with stateful components storing offsets in memory. However, stateful components should persist their state during shutdown if a storage extension is available.
At present, enabling stateful behavior involves a somewhat lengthy process:
filestorage
entry to theextensions
stanza in the configuration.filestorage
underservice::extensions
.storage: file_storage/xyz
to individual components.It would be beneficial to simplify this process by introducing a feature gate or a single configuration option. With this approach, users could enable stateful mode with a single setting, and the necessary steps would be handled automatically, achieving the same effect as the manual steps described.
Describe the solution you'd like
Here’s one way to tackle the enhancement:
filestorage
extension.healtcheck
,pprof
,memorylimitter
, etc.This is a high-level overview, and I expect it will require multiple pull requests to implement.
Describe alternatives you've considered
Please share if you have any thoughts!
Additional context
My solution would require open-telemetry/opentelemetry-collector#11046 to be merged or a similar workaround to be implemented to combine lists in config.
The text was updated successfully, but these errors were encountered: