-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Fix retry behavior for Jdbc Client #7561
Conversation
private static final Set<String> RETRYABLE_CONNECTION_SQL_STATES = | ||
ImmutableSet.of("08000", "08003", "08006", "08004", "08007"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably link what these SqlStates are in a code comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for including
- 08004 As per doc look like it's due to sql rejecting the connection. refering this doc of pg-jdbc driver mentioned in description above doc :
* The server rejected our connection attempt. Usually an authentication failure, but could be a
* configuration error like asking for a SSL connection with a server that wasn't built with SSL
* support.
*/
CONNECTION_REJECTED("08004"),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, would be great to add comments about the meaning of error codes, not sure if there are any library to directly import these values. Also can we add some tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the change and making client pool more configurable, linking previous discussions on the same as well #3099
private static final Set<String> RETRYABLE_CONNECTION_SQL_STATES = | ||
ImmutableSet.of("08000", "08003", "08006", "08004", "08007"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for including
- 08004 As per doc look like it's due to sql rejecting the connection. refering this doc of pg-jdbc driver mentioned in description above doc :
* The server rejected our connection attempt. Usually an authentication failure, but could be a
* configuration error like asking for a SSL connection with a server that wasn't built with SSL
* support.
*/
CONNECTION_REJECTED("08004"),
Me and my co-worker just raised in Iceberg General Slack the same issue that this PR is fixing. Any timeline on when this could be merged in? And also, will this fix work for when a Postgresql database has closed an idle connection (so a reconnect will be done)? |
I tested this commit inside our fork to see if it would fix our idle connection timeout problem with Postgresql. Turns out the SQLState returned by Postgres ( I think the current PR is a step in the right direction, but limiting the support to a handful of standard SQL states will not address every use-case (read database vendor). Since the JDBCCatalog is written to work with any database supported by JDBC, we need to provide a configuration option to extend the list of SQLState related to retryable connection exceptions. The configurable list would be vendor specific since for example @amogh-jahagirdar , do you think it would make sense to add a configurable option for a list of retryable connection exception codes? |
@amogh-jahagirdar, Is there any progress on making the error codes configurable. This would avoid forks just to add code specific to for example posgresql. |
Since the JdbcCatalog provides the means to specify your own JdbcClientPool by providing a client pool builder, one could achieve the desired behavior by extending the JdbcCatalog and overriding the constructor without requiring any direct changes to the current implementation of the ClientPoolImpl. For example, something along those line: package org.abc.catalogs;
import java.sql.SQLException;
import java.util.List;
import java.util.Map;
import org.apache.iceberg.jdbc.JdbcCatalog;
import org.apache.iceberg.jdbc.JdbcClientPool;
public class CustomJdbcCatalog extends JdbcCatalog {
private static final List<String> RETRY_EXCEPTION_STATE = List.of("08006", "57000");
public CustomJdbcCatalog() {
super(null, (Map<String, String> props) -> {
String uri = props.get("uri");
return new JdbcClientPool(uri, props) {
@Override
protected boolean isConnectionException(Exception exc) {
boolean isRetry = super.isConnectionException(exc);
if (!isRetry && exc instanceof SQLException sqle) {
isRetry = RETRY_EXCEPTION_STATE.contains(sqle.getSQLState());
}
return isRetry;
}
};
}, false);
}
} This gives you full control over the logic of the retry. Then, if in the context of spark, you could simply specify Of course this is more complex than having the means to "configure" the retry logic, but it gets you where you are trying to go. |
@nastra @szehon-ho Is there anything we can do to push this, please ? |
Hey all @jean-humann @cccs-br @cccs-eric @cccs-jc sorry I forgot to follow up on this PR! I'm taking a look now to see how to carry this forward. It does seem like since there are codes which will not be part of the standard spec, but that still need to be retried , that we need a configurable list (since we wouldn't want to maintain such a list in Iceberg) |
@jean-humann @cccs-br @cccs-eric @cccs-jc #10140 this is a draft PR for having a configurable set of retry codes. I'll need to write some tests for it but just a heads up since I don't think I'll be updating this branch, and would be closing this PR if there's consensus from others that having a user provided error code list is a good idea to solve the problems you all are facing! I think it is the most practical option at the moment until we separate catalogs from the reference implementation in the future https://www.youtube.com/watch?v=uAQVGd5zV4I&t=1323s |
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
My team hit this issue specifically with PostgreSQL today. Would love it if we could at least get something in that works for Postgres in particular, since that seems to be the most common thing that brings people here. Please let me know how we can help move this forward! Edit: I see #10140 is an updated version of this PR and it has been merged. Thanks @amogh-jahagirdar for seeing this through! Much appreciated. |
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Fixes #7173
There are certain
SqlExceptions
which are currently not being retried in JdbcClient but they should be retried based on their SqlState.Although this implementation is really only an incremental improvement, because currently the retry behavior in
ClientPoolImpl
is hardcoded to retry only once after the initial request and without any backoff. Also currently, only a single exception can be passed in for retries which is a bit limiting.I think there's a good amount of room for refactoring ClientPoolImpl to enable custom amount of retries and multiple exceptions similar to our
Tasks
implementation.Right now, the way retry can be tuned is by overriding
isConnectionException
(same as what is done in HiveClientPoolImpl). I think we can take refactoring separately since it'll be a bit more intrusive and in the interim at least we can incrementally improve the behavior.Furthermore, the exception that is currently being retried
SqlNonTransientConnectionException
should not be retried. The client should be retryingSqlTransientConnectionException
, so this behavior has been fixed.cc: @jackye1995 @singhpk234 @nastra @szehon-ho