Skip to content

Fix duplicate retry log messages #1108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 24, 2023
Merged

Conversation

stIncMale
Copy link
Member

@stIncMale stIncMale commented Apr 17, 2023

The code that logs retries is now added by the decorate*WithRetries methods. Sometimes, this may result in us not logging even the short description of the command that is being retried, because currently we know the command (we create it) only after checking out a connection and checking if the server supports retries. So if the first attempt fails at a point before we create the command (e.g., because of the MongoConnectionPoolClearedException), we don't do retryState.attach(AttachmentKeys.commandDescriptionSupplier(), command::getFirstKey, false), and the retrying attempt does not have the information about the command that it retries. This situation will be improved when Log operation ID when retrying is done.

Log messages examples

Before the PR

For each retry attempt we emitted two corresponding log messages. It looked either like

> Retrying the operation aggregate ... attempt #2
> Retrying the operation aggregate ... attempt #2

or like

> Retrying an operation ... attempt #2
> Retrying the operation aggregate ... attempt #2

Note how in the second example the first message does not provide a description (aggregate) of operation that was being retried. This is because when a retry attempt starts, it may have no information on what command is bing retried, if the first attempt failed too soon to create the command and leave its description for the retry attempt to use.

The first of the two messages in each example is from the start of a retry attempt, the second message is from a point at which the retry attempt is guaranteed to know what is being retried (this point is somewhere in the middle of a retry attempt).

After the PR

For each retry attempt we emit one corresponding log message. It looks either like

> Retrying the operation 'aggregate' ... attempt #2

or, if the information on the retrying command is not yet available

> Retrying the operation ... attempt #2

The message comes from the start of a retry attempt.

So now we don't duplicate retry messages, and don't log them from the middle of a retry attempt, but as a result, we may sometimes not log the description of the retried command. Logging operation ID will allow users to correlate this messages with previous log messages to still see what is being retried, which is why I filed Log operation ID when retrying. Alternatively, we may refactor our code to propagate the description of the retried command from the higher API level (the one that is closer to the user), but I don't think we should do that.

JAVA-4910

@stIncMale stIncMale requested a review from jyemin April 17, 2023 16:51
Comment on lines +610 to +611
? format("Retrying the operation due to the error \"%s\"; attempt #%d", exception, oneBasedAttempt)
: format("Retrying the operation '%s' due to the error \"%s\"; attempt #%d",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt "Retrying the operation ..." is grammatically correct in a situation when we don't specify what operation is being retried, but having the same longer prefix for two messages seems better.

Comment on lines +422 to 423
.attach(AttachmentKeys.commandDescriptionSupplier(), command::getFirstKey, false)
.attach(AttachmentKeys.command(), command, false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why the command description for writes was auto-cleared, but given that the command was not auto-cleared, auto-clearing the description did not make sense.

Comment on lines +479 to 480
.attach(AttachmentKeys.commandDescriptionSupplier(), command::getFirstKey, false)
.attach(AttachmentKeys.command(), command, false);
Copy link
Member Author

@stIncMale stIncMale Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to #1108 (comment).

@stIncMale stIncMale self-assigned this Apr 18, 2023
@jyemin
Copy link
Collaborator

jyemin commented Apr 19, 2023

The code that logs retries is now added by the decorate*WithRetries methods. Sometimes, this may result in us not logging even the short description of the command that is being retried, because currently we know the command (we create it) only after checking out a connection and checking if the server supports retries.

Are you saying that this is the result of the refactoring, or this it's the result of the required retry algorithm, regardless of the implementation strategy. I think you mean the former, but want to confirm before proceeding.

@stIncMale
Copy link
Member Author

Are you saying that this is the result of the refactoring, or this it's the result of the required retry algorithm, regardless of the implementation strategy. I think you mean the former, but want to confirm before proceeding.

@jyemin, I added examples and more explanations to the description of the PR.

@jyemin
Copy link
Collaborator

jyemin commented Apr 24, 2023

So now we don't duplicate retry messages, and don't log them from the middle of a retry attempt, but as a result, we may sometimes not log the description of the retried command.

It would be more useful if the driver emits the retry log message with information about the command. Given that currently the second log message does have the command available, that at least seems like it's possible, even if it results in a more complicated implementation.

What do you think?

@stIncMale
Copy link
Member Author

stIncMale commented Apr 24, 2023

It would be more useful if the driver emits the retry log message with information about the command.

There are two issues with such an approach in addition to returning the sprinkled logging calls all over the place, which is why I did not choose this path:

  1. We log a message stating that we are retrying well after we have started the attempt, which results in emitting some messages about activities that are part of retrying before the message stating that we are retrying. This makes our logs confusing.
  2. If we log in the middle of an attempt, we will not log that we have attempted to retry in some cases.

The second item is on its own worse than what we have with the current PR, but combined with the first issue, it's way worse. If we want our cake and eat it too, we should get back to the situation when we emit two log messages per a retry attempt, but we change the wording in the messages to something like:

  • First message: "Begin retrying the operation < [sometimes missing] command description > ... attempt #2"
  • Second message: "Created the < command description > command to retry it ... attempt #2"

Copy link
Collaborator

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stIncMale stIncMale merged commit 8f6ace4 into mongodb:master Apr 24, 2023
@stIncMale stIncMale deleted the JAVA-4910 branch April 24, 2023 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants