Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed timeout excessive logging in case of index is idle in replication #1114

Merged
merged 5 commits into from
Sep 4, 2023

Conversation

mohitamg
Copy link
Contributor

@mohitamg mohitamg commented Sep 4, 2023

Description

Relevant exceptions(5xx) are retried under the common module and during this time whole stack trace is logged under "WARN".
For certain operations, this can create excessive logging and creates noise.
Removed this log.warn message

Issues Resolved

Resolved #267

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…tion

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
…tion

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
@codecov
Copy link

codecov bot commented Sep 4, 2023

Codecov Report

Merging #1114 (54f7164) into main (b6d1b56) will increase coverage by 0.81%.
The diff coverage is 100.00%.

❗ Current head 54f7164 differs from pull request most recent head 72c05b3. Consider uploading reports for the commit 72c05b3 to get more accurate results

@@             Coverage Diff              @@
##               main    #1114      +/-   ##
============================================
+ Coverage     74.31%   75.13%   +0.81%     
  Complexity     1021     1021              
============================================
  Files           141      141              
  Lines          4762     4761       -1     
  Branches        521      521              
============================================
+ Hits           3539     3577      +38     
+ Misses          883      850      -33     
+ Partials        340      334       -6     
Files Changed Coverage Δ
...tlin/org/opensearch/replication/util/Extensions.kt 63.33% <100.00%> (-0.41%) ⬇️

... and 6 files with indirect coverage changes

@@ -137,10 +137,6 @@ suspend fun <Req: ActionRequest, Resp: ActionResponse> Client.suspendExecuteWith
throw ReplicationException(e, RestStatus.TOO_MANY_REQUESTS)
}
}
log.warn(
"Encountered a failure while executing in $req. Retrying in ${currentBackoff / 1000} seconds" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of deleting the complete log entry, should we explore keeping it and only remove the stack trace?
We can just print the error message maybe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this messages comes handy when troubleshooting failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed stack trace, but tried getting exception via retryException.cause, retryException.message and retryException.toString() but all of them returned empty, hence the exception message

Copy link
Contributor Author

@mohitamg mohitamg Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2023-09-04T13:54:45,047][WARN ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] Encountered a failure while executing changes. Retrying in 10 seconds.OpenSearchTimeoutException can be ignored!!
[2023-09-04T13:56:55,095][WARN ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] Encountered a failure while executing changes. Retrying in 20 seconds.OpenSearchTimeoutException can be ignored!!
[2023-09-04T13:59:15,166][WARN ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] Encountered a failure while executing changes. Retrying in 40 seconds.OpenSearchTimeoutException can be ignored!!
[2023-09-04T14:01:55,209][WARN ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] Encountered a failure while executing changes. Retrying in 80 seconds.OpenSearchTimeoutException can be ignored!!
[2023-09-04T14:05:15,259][INFO ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] opensearch[followCluster-1][replication_follower][T#3] @coroutine#7: Timed out waiting for new changes. Current seqNo: 1. OpenSearchTimeoutException[1m]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way , last log print is already printing out OpenSearchTimeoutException, so its fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it as Encountered a failure(can be ignored) while getting changes: OpenSearchTimeoutException. Retrying in ...

Copy link
Contributor Author

@mohitamg mohitamg Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2023-09-04T14:57:33,124][WARN ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] Encountered a failure(can be ignored) while getting changes:  OpenSearchTimeoutException. Retrying in 10 seconds.
<===========--> 90% EXECUTING [5m 35s]
[2023-09-04T14:59:43,157][WARN ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] Encountered a failure(can be ignored) while getting changes:  OpenSearchTimeoutException. Retrying in 20 seconds.
<===========--> 90% EXECUTING [5m 40s]
<===========--> 90% EXECUTING [7m 55s]
[2023-09-04T15:02:03,216][WARN ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] Encountered a failure(can be ignored) while getting changes:  OpenSearchTimeoutException. Retrying in 40 seconds.
[2023-09-04T15:04:43,262][WARN ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] Encountered a failure(can be ignored) while getting changes:  OpenSearchTimeoutException. Retrying in 80 seconds.
[2023-09-04T15:08:03,284][INFO ][o.o.r.t.s.ShardReplicationTask] [followCluster-1] [follower-01][0] opensearch[followCluster-1][replication_follower][T#2] @coroutine#7: Timed out waiting for new changes. Current seqNo: 1. OpenSearchTimeoutException[1m]

…e in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
…e in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
@monusingh-1 monusingh-1 enabled auto-merge (squash) September 4, 2023 10:08
@monusingh-1 monusingh-1 merged commit 426a2de into opensearch-project:main Sep 4, 2023
10 of 11 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 4, 2023
…tion (#1114)

* Removed timeout excessive logging in case of index is idle in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removed timeout excessive logging in case of index is idle in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removed timeout excessive stack trace logging in case of index is idle in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removed timeout excessive stack trace logging in case of index is idle in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Changed the log statement

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

---------

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
(cherry picked from commit 426a2de)
monusingh-1 pushed a commit that referenced this pull request Sep 4, 2023
…tion (#1114) (#1115)

* Removed timeout excessive logging in case of index is idle in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removed timeout excessive logging in case of index is idle in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removed timeout excessive stack trace logging in case of index is idle in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removed timeout excessive stack trace logging in case of index is idle in replication

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Changed the log statement

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

---------

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
(cherry picked from commit 426a2de)

Co-authored-by: Mohit Kumar <113413713+mohitamg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Avoid excessive logging during certain exception types
3 participants