Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Windows input operator should resubscribe when remote host restarts #35175

Merged

Conversation

kuiperda
Copy link
Contributor

@kuiperda kuiperda commented Sep 13, 2024

Description:

While collecting Windows Event Logs from a remote host, if that host restarts or shuts off, the collector begins logging errors and stops collecting WE logs, even after the remote host is back up. Restarting the collector fixes this.

To fix this without restarting the collector, the stanza input operator needs to resubscribe if the remote host restarts.

Error log:

"Failed to read events from subscription","error":"The handle is invalid."

Link to tracking Issue:

Testing:

Manually tested with a build of the collector, and confirmed logs continue being collected after the remote host restarts.

Documentation:

N/A, this change is expected behavior

@pjanotti
Copy link
Contributor

/label os:windows

@pjanotti
Copy link
Contributor

/label Run Windows

Not sure if this works...

@atoulme atoulme added the Run Windows Enable running windows test on a PR label Sep 13, 2024
Copy link
Contributor

@VihasMakwana VihasMakwana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check for a remote subscription here

subscription := NewLocalSubscription()
if i.isRemote() {
subscription = NewRemoteSubscription(i.remote.Server)
}

I think we should also do same while resubscribing. What do you say?

@kuiperda kuiperda force-pushed the windowseventlog/fix-msrpc-reconnect branch from 961aa95 to 40184e9 Compare September 16, 2024 14:22
@kuiperda kuiperda marked this pull request as ready for review September 16, 2024 20:31
@kuiperda kuiperda requested a review from a team September 16, 2024 20:31
@VihasMakwana
Copy link
Contributor

Apart from my last comment, LGTM

Copy link
Contributor

@pjanotti pjanotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a remote subscription is created its handle is invalid (0). The subscription needs to be opened before using it to read remote events.

pkg/stanza/operator/input/windows/input.go Outdated Show resolved Hide resolved
@kuiperda kuiperda requested a review from a team as a code owner September 19, 2024 19:23
@kuiperda kuiperda force-pushed the windowseventlog/fix-msrpc-reconnect branch from 1f718d1 to 844a6a4 Compare September 19, 2024 20:26
Copy link
Contributor

@VihasMakwana VihasMakwana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

pkg/stanza/operator/input/windows/input.go Outdated Show resolved Hide resolved
@kuiperda kuiperda force-pushed the windowseventlog/fix-msrpc-reconnect branch from cb26e30 to 8595e6f Compare September 20, 2024 14:54
@kuiperda kuiperda force-pushed the windowseventlog/fix-msrpc-reconnect branch from 8595e6f to 4ac1c32 Compare September 23, 2024 12:20
@kuiperda
Copy link
Contributor Author

@atoulme this can be merged

@djaglowski djaglowski merged commit 2fb1c78 into open-telemetry:main Sep 24, 2024
172 checks passed
@github-actions github-actions bot added this to the next release milestone Sep 24, 2024
djaglowski pushed a commit that referenced this pull request Oct 3, 2024
… successfully (#35520)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

If `Input::read` returns 0, the operator will attempt to open the
subscription again. This happens after a resubscribe *or* whenever there
are no event logs to read, which is often. As a result, this error gets
logged constantly:
```
"Failed to re-open subscription for remote server"..."error: subscription handle is already open"
```
To fix this, the handle should only attempt to re-open when the
resubscribe happens.

This is a followup to #35175 which fixed that resubscribe issue for the
user that raised it, but they have also been confused by this error
being logged.

**Link to tracking Issue:** <Issue number if applicable> **N/A**

**Testing:** <Describe what testing was performed and which tests were
added.> Manually verified log collection is successful before and after
remote host reconnects, and that the noisy error is silenced except
during resubscribe, where it is relevant.

**Documentation:** <Describe the documentation added.> **N/A**
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this pull request Oct 4, 2024
…st restarts (open-telemetry#35175)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

While collecting Windows Event Logs from a remote host, if that host
restarts or shuts off, the collector begins logging errors and stops
collecting WE logs, even after the remote host is back up. Restarting
the collector fixes this.

To fix this without restarting the collector, the stanza input operator
needs to resubscribe if the remote host restarts.

Error log:
```
"Failed to read events from subscription","error":"The handle is invalid."
```


**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

Manually tested with a build of the collector, and confirmed logs
continue being collected after the remote host restarts.

**Documentation:** <Describe the documentation added.>

N/A, this change is expected behavior
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this pull request Oct 4, 2024
… successfully (open-telemetry#35520)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

If `Input::read` returns 0, the operator will attempt to open the
subscription again. This happens after a resubscribe *or* whenever there
are no event logs to read, which is often. As a result, this error gets
logged constantly:
```
"Failed to re-open subscription for remote server"..."error: subscription handle is already open"
```
To fix this, the handle should only attempt to re-open when the
resubscribe happens.

This is a followup to open-telemetry#35175 which fixed that resubscribe issue for the
user that raised it, but they have also been confused by this error
being logged.

**Link to tracking Issue:** <Issue number if applicable> **N/A**

**Testing:** <Describe what testing was performed and which tests were
added.> Manually verified log collection is successful before and after
remote host reconnects, and that the noisy error is silenced except
during resubscribe, where it is relevant.

**Documentation:** <Describe the documentation added.> **N/A**
Eromosele-SM pushed a commit to sematext/opentelemetry-collector-contrib that referenced this pull request Oct 9, 2024
… successfully (open-telemetry#35520)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

If `Input::read` returns 0, the operator will attempt to open the
subscription again. This happens after a resubscribe *or* whenever there
are no event logs to read, which is often. As a result, this error gets
logged constantly:
```
"Failed to re-open subscription for remote server"..."error: subscription handle is already open"
```
To fix this, the handle should only attempt to re-open when the
resubscribe happens.

This is a followup to open-telemetry#35175 which fixed that resubscribe issue for the
user that raised it, but they have also been confused by this error
being logged.

**Link to tracking Issue:** <Issue number if applicable> **N/A**

**Testing:** <Describe what testing was performed and which tests were
added.> Manually verified log collection is successful before and after
remote host reconnects, and that the noisy error is silenced except
during resubscribe, where it is relevant.

**Documentation:** <Describe the documentation added.> **N/A**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg/stanza Run Windows Enable running windows test on a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants