Skip to content

feat: add retryable property for Error #1383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented May 27, 2025

Which issue does this PR close?

This PR is the first part of #964.

What changes are included in this PR?

Before we support retryable process, this PR add retryable property for Error.

Are these changes tested?

@@ -134,6 +134,8 @@ pub struct Error {

source: Option<anyhow::Error>,
backtrace: Backtrace,

retryable: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite confusing. Whether it's retryable should be determined by consumer by ErrorKind.

Copy link
Member

@Xuanwo Xuanwo May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether it's retryable should be determined by consumer by ErrorKind.

My current idea is to introduce an ErrorStatus:

  • temporary: This error is temporary, and users can retry the operation if they want.
  • permanent: This error is permanent, so users can't retry it. Even if they do, it's highly likely that it won't work.

In this way, we convey the status, which is only known to us and is orthogonal of ErrorKind instead of the final decision, retryable.

For example, Unexpected error can be permanent and PreconditionFailed error can be temporary.

This is also very useful in cases like our REST catalog client. We can offer built-in retry support and convert a temporary error into a permanent one after several retry attempts.


I also think it would be a good idea to use a temporary: bool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite confusing. Whether it's retryable should be determined by consumer by ErrorKind.

temporary: bool +1. I think the basic idea here is to let retryable or temporary property orthogonal of ErrorKind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, Unexpected error can be permanent and PreconditionFailed error can be temporary.

I don't quite understand why PreconditionFailed can be temporrary. For example, PreconditionFailed could be invalid input from user, and it should be permanent.

I think the basic idea here is to let retryable or temporary property orthogonal of ErrorKind.

I'm still confusing about this part. Let's think about the use case, usually the caller of a function determines whether they need to retry, depending on the return error. For example, the user could decide to retry strategy when they face network timeout failure.
If we add the status or temporary field, it's the function telling its caller whether they should retry. My concern is that adding such a field a general purpose Error could be misleading or confusing. If we add this field into a concrete ErrorKind variant such as CommitConflict(bool), it makes more sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add the status or temporary field, it's the function telling its caller whether they should retry. My concern is that adding such a field a general purpose Error could be misleading or confusing.

This example makes sense to me.🤔 There are two kinds of retry process: 1. commit conflict, 2. other(maybe network timeout as @liurenjie1024 says). It seems that in the SDK, at least for now, we only try to retry the process when we meet CommitConflict. (In Java, exceptions only retry when they meet CommitFailedException). For other errors, maybe it should be determined by the user. And we can determine whether need to let retry orthogonal of ErrorKind until we have multiple kinds of errors that need to be retried?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why PreconditionFailed can be temporrary.

In cases like concurrent commit. Users can retry until this operation succeed or give up while retry times reached.

Take S3 as an example:

https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

ConditionalRequestConflict

A conflicting operation occurred. If using PutObject you can retry the request. If using multipart upload you should initiate another CreateMultipartUpload request and re-upload each part.


If we add the status or temporary field, it's the function telling its caller whether they should retry.

I think that's exactly why I want to use "temporary" here. We're telling users that this error is temporary, allowing them to decide how to proceed based on their own context.

A temporary error doesn't necessarily mean it should be retried.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases like concurrent commit. Users can retry until this operation succeed or give up while retry times reached.

Shouldn't this return by the catalog client as a CommitConflict?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants