Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HowTo: Telemetry Logging to File when running as windows service #5300

Closed
Mario-Hofstaetter opened this issue Apr 29, 2022 · 8 comments · Fixed by #9726
Closed

HowTo: Telemetry Logging to File when running as windows service #5300

Mario-Hofstaetter opened this issue Apr 29, 2022 · 8 comments · Fixed by #9726
Assignees
Labels
bug Something isn't working os:windows Windows specific issues

Comments

@Mario-Hofstaetter
Copy link

Describe the bug

This is no traditional bug report, more a documentation how to solve the described issue.

If maintainers would like a PR to expand the docs anywhere to contain information about this (not sure what from it), I could try to contribute.
If not, this could also considered a rant and be closed. Maybe it helps somebody google landing here.

I spent several HOURS to get a solution for something that should by trivial to do.

The Situation:

  • Running otelcol.exe on windows as a windows-service (supported natively by the binary)
  • Otelcol spams windows application eventlog with messages (is this config documented anywhere?),
    • The event logs are somewhat broken? (see screenshots below)
  • It is not possible to "bypass" otelcol telemetry logs to an exporter to send them off directly? (e.g. using loki)

The Goal:

  • Write logmessages of otelcol to files (to tail and ship to central logging server)
    • This is necessary to observe issues with otelcol
  • Use json as log format
  • Use Log rotation so the file does not get huge and fills up harddisk in any case
    • Log Roration is a must have feature on production ⚡

SCREENSHOTS: Buggy windows application logs from otelcol (CLICK ME)

this:
grafik
is actually:
grafik

and this:
grafik
is actually:
grafik


The journey:

The docs here https://opentelemetry.io/docs/collector/configuration/#service describe the telemetry logs and link to here: https://github.com/open-telemetry/opentelemetry-collector/blob/7666eb04c30e5cfd750db9969fe507562598f0ae/config/service.go

	// Encoding sets the logger's encoding.
	// Example values are "json", "console".
	Encoding string `mapstructure:"encoding"`

	// OutputPaths is a list of URLs or file paths to write logging output to.
	// The URLs could only be with "file" schema or without schema.
	// The URLs with "file" schema must be an absolute path.
	// The URLs without schema are treated as local file paths.
	// "stdout" and "stderr" are interpreted as os.Stdout and os.Stderr.
	// see details at Open in zap/writer.go.
	// (default = ["stderr"])
	OutputPaths []string `mapstructure:"output_paths"`

Windows Eventlog is not mentioned however.

encoding: json worked when testing in console. But stdout / stderr are not available in a windows-service and can't be redirected either (afaik). So tried using a filepath:

  telemetry:
    logs:
      level: info
      encoding: json
      output_paths:       ["otelcol.log.json"]

Realize using relatives paths is not good, because windows services use C:\Windows\System32 (doh) so your logs end up there.

Since our installation directory is fixed, I then tried using an absolute file path for output_paths, which was unsuccessful.

After googling found this: uber-go/zap#621
For nearly 4 yours, logging to absolute file paths on windows is broken in zap ¯\(ツ)

I nearly managed a hack-around by using this path to get out from system32:

output_paths: ["../../Program Files/OtelCollector/otelcol.log.json"]

which created the logfile in the correct directory, but did not write any logs to it. Could not figure out why.

Also, otelcol source does not mention log filesize limit / log rotation (?), so unsuitable anyway.

Workaround service wrapper

Logging natively seemed not possible. So try using stdout and redirect that to a file using 3rd party.

https://nssm.cc/ is no longer maintained and dubious, so tried using https://github.com/winsw/winsw

When starting otelcol as a child of winsw as service, it crashed with the following message:

The service process could not connect to the service controller

After searching the github repo find this:

The process may fail to start in a Windows Docker container with the following error: The service process could not connect to the service controller. In this case the NO_WINDOWS_SERVICE=1 environment variable should be set to force the collector to be started as if it were running in an interactive terminal, without attempting to run as a Windows service.

The final puzzle peace. Using an env variable in winsw config kept otelcol from crashing.

Logs from stdout where picked up by winsw and logged to a file, enabling filerotation through winsw.

(the files are then tailed by Grafana promtail and sent to loki, since logging in OTEL is still unstable)

What did you expect to see?

An easier way to log otelcol log-messages to a file.

What version did you use?
Version: (e.g., v0.48.0)

Environment
OS: Windows 10 / Windows Server 2019

@Mario-Hofstaetter Mario-Hofstaetter added the bug Something isn't working label Apr 29, 2022
@jasase
Copy link

jasase commented Dec 9, 2022

Still having the same problems in v0.66.0

@jpkrohling
Copy link
Member

As far as I know, we don't have maintainers using Windows. This is unlikely to move forward without help from people running the collector on Windows.

@cocowalla
Copy link

Still an issue in v0.92.0:

  • No content written to file specified with output_paths (file is locked by the OTel process, but nothing written)
  • Windows "Application" Event Log spammed with Incorrect function event logs - seems to be one log written for every line that should have been written to the file instead
  • No way to disable writing to Windows Event Log

Really wish the collector could have see some Windows love 💔

@jpkrohling
Copy link
Member

jpkrohling commented Jan 25, 2024

As far as I know, we are still in the same position: people who care about OTel Collector on Windows should join us and help move this forward.

@cocowalla
Copy link

As far as I know, we are still in the same position: people who care about OTel Collector on Windows should join us and help move this forward.

I'd help if I could, but I've never used Go before, and wouldn't even know how to start with something like debugging why logs aren't being written 🤷. Maybe I'll get some time to tinker one of these days...

@pjanotti
Copy link
Contributor

pjanotti commented Feb 2, 2024

Hi @cocowalla - I do work with both golang and Windows and can take a look at this, but, would appreciate someone double-checking the configurations and extra testing. You should be able to get rid of the Incorrect function message by performing the setting up an event provider to the collector service, as described in this comment.

@jpkrohling you can assign this one to me.

@cocowalla
Copy link

@pjanotti Great! I did actually have a go the other day, and got it to build easily enough, but it's the "contrib" version I need and could never get changes I made in the "main" version to apply in the "contrib" version when I built it. 🤷‍♂️

Setting up an event source did indeed get rid of the Incorrect funcion messages, so thanks for that! Also, happy to check/test whatever you need 👍

@pjanotti
Copy link
Contributor

pjanotti commented Feb 3, 2024

Setting the output_paths doesn't work as intended because the zap configuration done to redirect the logs to the Event Log doesn't forward the log messages to the default/core logger at

return windowsEventLogCore{core, elog, zapcore.NewConsoleEncoder(encoderConfig)}

This redirection to the Event Log is done much earlier than the code loading the configuration and setting up the telemetry so to consider disabling the Event Log requires us to be a bit careful. For quite some time the default behavior of the collector is to log to the Event Log when running on Windows as service and although there are a few issues complaining about that I think many users are fine (and quiet) with that default. At first it seems that the best option is to add a new telemetry configuration option to control the log levels sent to the Event Log (so one could avoid the more verbose messages) and a way to disable it completely. This way we preserve the current behavior but allow the ones that want to turn down/off the usage of Event Log on Windows.

@cocowalla the changes should be done only on "core" (the current repo), the contrib version will be updated after a core release with the changes.

Before moving ahead with a change proposal I'm planning to review the issues related to usage of the Event Log by the collector.

dmitryax pushed a commit that referenced this issue Mar 6, 2024
**Description:**
Adding a workflow to fix #6455 this will also be needed when fixing
#5300

Fixes #6455

**Link to tracking Issue:**
#6455
dmitryax pushed a commit that referenced this issue Mar 15, 2024
)

**Description:**
Fixes #5300 

With this change the service telemetry section is respected by the
collector when running as a Windows service. Log lever can be used to
control the verbosity of the events logged and the logger can be
redirected to a file by specifying an output path on the service
telemetry config. By default `stdout` and `stderr` are redirected to the
event log when running as a Windows service to keep the current
behavior.

The code change itself was made with a focus of not breaking the public
APIs and not reading the config more than once. That said it is probably
something to be refactored when the public APIs can be touched again.

**Link to tracking Issue:**
#5300

**Testing:**
The test is an integration test that depends on the actual executable.
It checks for event publication and file output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working os:windows Windows specific issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants