-
Notifications
You must be signed in to change notification settings - Fork 217
Retry behavior for synchronous calls during initialization #2118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3ae8b98
1286b17
bb16ede
156014b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ | |
import java.util.List; | ||
import java.util.Map; | ||
import java.util.Optional; | ||
import java.util.concurrent.Callable; | ||
import java.util.function.Consumer; | ||
import javax.annotation.Nonnull; | ||
|
||
|
@@ -66,6 +67,9 @@ | |
import oracle.kubernetes.operator.calls.RetryStrategy; | ||
import oracle.kubernetes.operator.calls.SynchronousCallDispatcher; | ||
import oracle.kubernetes.operator.calls.SynchronousCallFactory; | ||
import oracle.kubernetes.operator.logging.LoggingFacade; | ||
import oracle.kubernetes.operator.logging.LoggingFactory; | ||
import oracle.kubernetes.operator.logging.MessageKeys; | ||
import oracle.kubernetes.operator.work.Step; | ||
import oracle.kubernetes.weblogic.domain.api.WeblogicApi; | ||
import oracle.kubernetes.weblogic.domain.model.Domain; | ||
|
@@ -77,6 +81,8 @@ | |
/** Simplifies synchronous and asynchronous call patterns to the Kubernetes API Server. */ | ||
@SuppressWarnings({"WeakerAccess", "UnusedReturnValue"}) | ||
public class CallBuilder { | ||
private static final LoggingFacade LOGGER = LoggingFactory.getLogger("Operator", "Operator"); | ||
|
||
/** HTTP status code for "Not Found". */ | ||
public static final int NOT_FOUND = 404; | ||
|
||
|
@@ -594,6 +600,54 @@ private <T> T executeSynchronousCall( | |
return DISPATCHER.execute(factory, requestParams, helper); | ||
} | ||
|
||
/** | ||
* Execute a synchronous call with a retry on failure. | ||
* @param call The call | ||
* @param retryDelaySeconds Retry delay in seconds | ||
* @param <T> Call return type | ||
* @return Results of operation, if successful | ||
* @throws Exception Exception types other than ApiException, which will cause failure | ||
*/ | ||
public <T> T executeSynchronousCallWithRetry(Callable<T> call, int retryDelaySeconds) throws Exception { | ||
/* | ||
* Implementation Note: synchronous calls are only allowed during operator initialization. | ||
* All make-right work must be done with the asynchronous calling pattern. Therefore, since | ||
* we know that this method will only be invoked during operator initialization, we've chosen | ||
* not to put a limit on the number of retries. This is acceptable because the liveness probe will | ||
* eventually kill the operator if the initialization sequence does not complete. | ||
* | ||
* This call was specifically added to address the Istio-related use case where the operator attempts | ||
* to initialize prior to the Istio Envoy sidecar completing its initialization as described in this | ||
* Istio bug: https://github.com/istio/istio/issues/11130. However, the pattern will also work for | ||
* use cases where the Kubernetes master happens to temporarily unavailable just as the operator is | ||
* starting. | ||
*/ | ||
T result = null; | ||
boolean complete = false; | ||
do { | ||
try { | ||
result = call.call(); | ||
complete = true; | ||
} catch (RuntimeException re) { | ||
Throwable cause = re.getCause(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the ApiException ever get nested further down? Is there ever a case where you need to loop through the causes looking for ApiException? |
||
if (cause instanceof ApiException) { | ||
LOGGER.warning(MessageKeys.EXCEPTION, cause); | ||
} | ||
} catch (Throwable t) { | ||
LOGGER.warning(MessageKeys.EXCEPTION, t); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add a comment here that we expect the liveness probe to cancel this process if it "retries forever". The next coder may not understand the assumption/expectation. |
||
} | ||
|
||
if (complete) { | ||
break; | ||
} | ||
|
||
Thread.sleep(retryDelaySeconds * 1000L); | ||
|
||
// We are intentionally not limiting the number of retries as described in the implementation note above. | ||
} while (true); | ||
return result; | ||
} | ||
|
||
/** | ||
* Read namespace. | ||
* | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened if it is a RuntimeException but not an ApiException? Do we want to limit the number of retries or make it retry forever (hopefully the condition can be resolved)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll think about other exceptions... I didn't want to limit the number of retries here because the operator cannot go on until it can connect to the master and the operator will eventually be killed by the liveness probe.