Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Hide this error message when ctx timeout occurs in s3.getObject #6360

Merged
merged 9 commits into from
Nov 9, 2022

Conversation

liguozhong
Copy link
Contributor

@liguozhong liguozhong commented Jun 10, 2022

What this PR does / why we need it:

level=error ts=2022-01-24T10:57:05.273404635Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled"
level=error ts=2022-01-24T10:57:05.273434964Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled"
level=warn ts=2022-01-24T10:57:05.273457211Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=16.524312361s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n"
level=error ts=2022-01-24T10:57:05.273486368Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled"
level=error ts=2022-01-24T10:57:05.273786914Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled"
level=error ts=2022-01-24T10:57:05.274246531Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled"
level=warn ts=2022-01-24T10:57:05.274385302Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=29.542213769s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n"

When logql timeout, do not print these logs.

Which issue(s) this PR fixes:
Fixes <5221>

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated
  • Is this an important fix or new feature? Add an entry in the CHANGELOG.md.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

@liguozhong liguozhong requested a review from a team as a code owner June 10, 2022 07:47
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
-            storage	-0.3%
+           chunkenc	0%
+              logql	0%
+               loki	0.7%

Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I see what the fix does, but could you provide some information in the PR description on how the bug manifests in operations?
Probably also worth a changelog entry, right?

Comment on lines 720 to 723
if ctx.Err() != nil {
errChan <- nil
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ctx.Err() only != nil if err != nil? Otherwise it would make sense to move it out of the if err != nil {} block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to your tips, this PR code is clearer.

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
-            storage	-0.2%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second to @chaudum a brief description of the problem this solves would help me as well. I'm currently understanding that since we retry, we don't want to show intermittent timeouts, and we only care about the last error. Is that correct?

@stale
Copy link

stale bot commented Aug 13, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Aug 13, 2022
@MichelHollands
Copy link
Contributor

MichelHollands commented Nov 7, 2022

@liguozhong Can you rebase this fix and add some description to the PR?

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Nov 7, 2022
@pull-request-size pull-request-size bot added size/M and removed size/XS labels Nov 7, 2022
@liguozhong
Copy link
Contributor Author

@liguozhong Can you rebase this fix and add some description to the PR?

done.

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
-            storage	-0.2%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@pull-request-size pull-request-size bot added size/S and removed size/M labels Nov 7, 2022
@pull-request-size pull-request-size bot added size/XS and removed size/S labels Nov 7, 2022
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0.1%
+               iter	0%
-            storage	-0.2%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
-            storage	-0.2%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
-            storage	-0.2%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
-            storage	-0.2%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@MichelHollands MichelHollands left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MichelHollands MichelHollands merged commit 0458dc8 into grafana:main Nov 9, 2022
Abuelodelanada pushed a commit to canonical/loki that referenced this pull request Dec 1, 2022
…rafana#6360)

**What this PR does / why we need it**:
```
level=error ts=2022-01-24T10:57:05.273404635Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled"
level=error ts=2022-01-24T10:57:05.273434964Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled"
level=warn ts=2022-01-24T10:57:05.273457211Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=16.524312361s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n"
level=error ts=2022-01-24T10:57:05.273486368Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: context canceled"
level=error ts=2022-01-24T10:57:05.273786914Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled"
level=error ts=2022-01-24T10:57:05.274246531Z caller=batch.go:699 msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled"
level=warn ts=2022-01-24T10:57:05.274385302Z caller=grpc_logging.go:55 method=/logproto.Querier/Query duration=29.542213769s err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" msg="gRPC\n"
```
When logql timeout, do not print these logs. Since we retry, we don't want to show intermittent timeouts, and we only care about the last error.

**Which issue(s) this PR fixes**:
Fixes [<5221>](grafana#5221)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

error fetching chunks & failed to get s3 object
5 participants