Fix duplicate retry log messages #1108

stIncMale · 2023-04-17T16:51:33Z

The code that logs retries is now added by the decorate*WithRetries methods. Sometimes, this may result in us not logging even the short description of the command that is being retried, because currently we know the command (we create it) only after checking out a connection and checking if the server supports retries. So if the first attempt fails at a point before we create the command (e.g., because of the MongoConnectionPoolClearedException), we don't do retryState.attach(AttachmentKeys.commandDescriptionSupplier(), command::getFirstKey, false), and the retrying attempt does not have the information about the command that it retries. This situation will be improved when Log operation ID when retrying is done.

Log messages examples

Before the PR

For each retry attempt we emitted two corresponding log messages. It looked either like

> Retrying the operation aggregate ... attempt #2
> Retrying the operation aggregate ... attempt #2

or like

> Retrying an operation ... attempt #2
> Retrying the operation aggregate ... attempt #2

Note how in the second example the first message does not provide a description (aggregate) of operation that was being retried. This is because when a retry attempt starts, it may have no information on what command is bing retried, if the first attempt failed too soon to create the command and leave its description for the retry attempt to use.

The first of the two messages in each example is from the start of a retry attempt, the second message is from a point at which the retry attempt is guaranteed to know what is being retried (this point is somewhere in the middle of a retry attempt).

After the PR

For each retry attempt we emit one corresponding log message. It looks either like

> Retrying the operation 'aggregate' ... attempt #2

or, if the information on the retrying command is not yet available

> Retrying the operation ... attempt #2

The message comes from the start of a retry attempt.

So now we don't duplicate retry messages, and don't log them from the middle of a retry attempt, but as a result, we may sometimes not log the description of the retried command. Logging operation ID will allow users to correlate this messages with previous log messages to still see what is being retried, which is why I filed Log operation ID when retrying. Alternatively, we may refactor our code to propagate the description of the retried command from the higher API level (the one that is closer to the user), but I don't think we should do that.

JAVA-4910

stIncMale · 2023-04-17T17:00:59Z

driver-core/src/main/com/mongodb/internal/operation/CommandOperationHelper.java

+                    ? format("Retrying the operation due to the error \"%s\"; attempt #%d", exception, oneBasedAttempt)
+                    : format("Retrying the operation '%s' due to the error \"%s\"; attempt #%d",


I doubt "Retrying the operation ..." is grammatically correct in a situation when we don't specify what operation is being retried, but having the same longer prefix for two messages seems better.

stIncMale · 2023-04-17T17:03:24Z

driver-core/src/main/com/mongodb/internal/operation/CommandOperationHelper.java

+                            .attach(AttachmentKeys.commandDescriptionSupplier(), command::getFirstKey, false)
                            .attach(AttachmentKeys.command(), command, false);


I don't know why the command description for writes was auto-cleared, but given that the command was not auto-cleared, auto-clearing the description did not make sense.

stIncMale · 2023-04-17T17:04:01Z

driver-core/src/main/com/mongodb/internal/operation/CommandOperationHelper.java

+                            .attach(AttachmentKeys.commandDescriptionSupplier(), command::getFirstKey, false)
                            .attach(AttachmentKeys.command(), command, false);


This is similar to #1108 (comment).

jyemin · 2023-04-19T23:56:15Z

The code that logs retries is now added by the decorate*WithRetries methods. Sometimes, this may result in us not logging even the short description of the command that is being retried, because currently we know the command (we create it) only after checking out a connection and checking if the server supports retries.

Are you saying that this is the result of the refactoring, or this it's the result of the required retry algorithm, regardless of the implementation strategy. I think you mean the former, but want to confirm before proceeding.

stIncMale · 2023-04-20T16:43:54Z

Are you saying that this is the result of the refactoring, or this it's the result of the required retry algorithm, regardless of the implementation strategy. I think you mean the former, but want to confirm before proceeding.

@jyemin, I added examples and more explanations to the description of the PR.

jyemin · 2023-04-24T17:10:14Z

So now we don't duplicate retry messages, and don't log them from the middle of a retry attempt, but as a result, we may sometimes not log the description of the retried command.

It would be more useful if the driver emits the retry log message with information about the command. Given that currently the second log message does have the command available, that at least seems like it's possible, even if it results in a more complicated implementation.

What do you think?

stIncMale · 2023-04-24T17:48:02Z

It would be more useful if the driver emits the retry log message with information about the command.

There are two issues with such an approach in addition to returning the sprinkled logging calls all over the place, which is why I did not choose this path:

We log a message stating that we are retrying well after we have started the attempt, which results in emitting some messages about activities that are part of retrying before the message stating that we are retrying. This makes our logs confusing.
If we log in the middle of an attempt, we will not log that we have attempted to retry in some cases.

The second item is on its own worse than what we have with the current PR, but combined with the first issue, it's way worse. If we want our cake and eat it too, we should get back to the situation when we emit two log messages per a retry attempt, but we change the wording in the messages to something like:

First message: "Begin retrying the operation < [sometimes missing] command description > ... attempt #2"
Second message: "Created the < command description > command to retry it ... attempt #2"

jyemin

LGTM

Fix duplicate retry log messages

8a6aee8

JAVA-4910

stIncMale requested a review from jyemin April 17, 2023 16:51

stIncMale commented Apr 17, 2023

View reviewed changes

stIncMale self-assigned this Apr 18, 2023

jyemin approved these changes Apr 24, 2023

View reviewed changes

stIncMale merged commit 8f6ace4 into mongodb:master Apr 24, 2023

stIncMale deleted the JAVA-4910 branch April 24, 2023 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix duplicate retry log messages #1108

Fix duplicate retry log messages #1108

Uh oh!

stIncMale commented Apr 17, 2023 •

edited

Loading

Uh oh!

stIncMale Apr 17, 2023

Uh oh!

stIncMale Apr 17, 2023

Uh oh!

stIncMale Apr 17, 2023 •

edited

Loading

Uh oh!

jyemin commented Apr 19, 2023

Uh oh!

stIncMale commented Apr 20, 2023

Uh oh!

jyemin commented Apr 24, 2023

Uh oh!

stIncMale commented Apr 24, 2023 •

edited

Loading

Uh oh!

jyemin left a comment

Uh oh!

Uh oh!

		? format("Retrying the operation due to the error \"%s\"; attempt #%d", exception, oneBasedAttempt)
		: format("Retrying the operation '%s' due to the error \"%s\"; attempt #%d",

		.attach(AttachmentKeys.commandDescriptionSupplier(), command::getFirstKey, false)
		.attach(AttachmentKeys.command(), command, false);

Fix duplicate retry log messages #1108

Fix duplicate retry log messages #1108

Uh oh!

Conversation

stIncMale commented Apr 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Log messages examples

Before the PR

After the PR

Uh oh!

stIncMale Apr 17, 2023

Choose a reason for hiding this comment

Uh oh!

stIncMale Apr 17, 2023

Choose a reason for hiding this comment

Uh oh!

stIncMale Apr 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jyemin commented Apr 19, 2023

Uh oh!

stIncMale commented Apr 20, 2023

Uh oh!

jyemin commented Apr 24, 2023

Uh oh!

stIncMale commented Apr 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jyemin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stIncMale commented Apr 17, 2023 •

edited

Loading

stIncMale Apr 17, 2023 •

edited

Loading

stIncMale commented Apr 24, 2023 •

edited

Loading