Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Container parser not playing well with downstream operator that experiences an error #34295

Closed
TylerHelmuth opened this issue Jul 29, 2024 · 11 comments
Assignees
Labels

Comments

@TylerHelmuth
Copy link
Member

Component(s)

pkg/stanza

What happened?

Description

I'm seeing strange behavior when I use the container parser chained with a regex parser that sometimes fails (because the regex is not always present in the log). When the regex_parser fails, logs collection stops. I believe this is because the container parser currently breaks out of the crioConsumer goroutine if the Write returns an error:

func (p *Parser) crioConsumer(ctx context.Context) {
entriesChan := p.crioLogEmitter.OutChannel()
p.criConsumers.Add(1)
defer p.criConsumers.Done()
for entries := range entriesChan {
for _, e := range entries {
err := p.Write(ctx, e)
if err != nil {
p.Logger().Error("failed to write entry", zap.Error(err))
return
}
}
}
}

Steps to Reproduce

Use a filelogreceiver config such as

filelog:
        exclude: []
        include:
        - /var/log/pods/*/*/*.log
        include_file_name: false
        include_file_path: true
        operators:
        - id: container-parser
          max_log_size: 102400
          type: container
        - type: regex_parser
          regexp: '\slevel=(?P<severity_field>\w+)\s'
          on_error: send_quiet
          severity:
            parse_from: attributes.severity_field
        retry_on_failure:
          enabled: true
        start_at: end

Send through a log that does not match the regex parser and then send through a log that does match.

Expected Result

Both logs are sent, with the second having a severity set.

Actual Result

Neither logs are sent.

Collector version

v0.105.0

Environment information

No response

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

Copy link
Contributor

Pinging code owners for pkg/stanza: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

Pinging code owners for receiver/filelog: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member Author

Is the solution to remove that return statement?

@ChrsMark
Copy link
Member

ChrsMark commented Jul 30, 2024

Yeap, I just verified that. After the first error the "failed" log is exported but then because of this return no other logs are consumed. That's definitely problematic. Removing the mentioned return statement sounds correct and seems to solve the issue.

This return was recently added and it seems a bit weird to have the processing/parsing to stop/hang if one downstream has "failed". That doesn't look like the intended behavior for a filelog's pipeline.

This makes me think of d31bc2e#diff-1818c0cdb6108c178ca89cf5ee768fba0a0de451e9b6431464429d6bf0d0a823R60 as well. @djaglowski do you think it's correct to return immediately if a downstream operator has failed 🤔 ? If the purpose of #33847 was to ensure that errors are logged (based on #33783), then we should only log and continue without returning.

@djaglowski
Copy link
Member

@ChrsMark I agree with your assessment. I think the loops were overlooked in both cases.

@ChrsMark
Copy link
Member

I can prepare a patch for these and look around for any other place that might hit something similar.
I will try to cover these with tests as well.

djaglowski pushed a commit that referenced this issue Jul 31, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
As discussed at
#34295,
error handling can be problematic in stanza package after the recent
changes at
#33847.
Specifically:
1. When
[`Write`](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/helper/writer.go#L50)
is called in a for loop, a returned error should not break the for loop.
It should just be logged.
2. When
[`Process`](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/d78d7bb37d0c84da04ac8a83356eb669ecd75a95/pkg/stanza/operator/operator.go#L40)
is called in a for loop, a returned error should not break the for loop.
It should just be logged.

**Link to tracking Issue:** <Issue number if applicable>
#34295

**Testing:** <Describe what testing was performed and which tests were
added.> Added

**Documentation:** <Describe the documentation added.> ~

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
@gbxavier
Copy link

I observed the same behavior using the container parser followed by a JSON parser. The following log line makes the collector stop consuming logs after it fails to parse the logline as JSON.

2024-05-17T09:29:45.145553464Z stderr F ts=2024-05-17T09:29:45.145Z caller=node_exporter.go:193 level=info msg="Starting node_exporter" version="(version=1.8.0, branch=HEAD, revision=cadb1d1190ad95c66b951758f01ff4c94e55e6ce)"

Thanks a lot for reporting and fixing this! Would you happen to have a timeline for when it will be released (and available for usage)?

@ChrsMark
Copy link
Member

ChrsMark commented Aug 1, 2024

Hey @gbxavier , based on the Release Schedule this should be available at v0.107.0 (around 2024-08-12).

@ChrsMark
Copy link
Member

ChrsMark commented Aug 1, 2024

@TylerHelmuth @djaglowski shall we close this one or we have anything else still pending?

@djaglowski
Copy link
Member

Resolved by #34342

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants