Skip to content

Commit 6e3fd81

Browse files
ankediarjeberhard
andauthored
Changes for OWLS-83136 - Limit concurrent pod shutdowns during a cluster shrink (#1892)
* changes for OWLS-83136 - Limit concurrent pod shutdowns during a cluster shrink * Minor code cleanup for OWLS-83136 * minor change to avoid duplicate step * fixed javadoc for deletePodAsyncWithRetryStrategy method * fix for integration test and added maxConcurrentShutdown in index.html * fix unit test failure and shutdown servers concurrently when domain serverStartPolicy is NEVER * Address PR review comments. * Changes to address PR review comments. * Changes to address PR review comments. * Changes to address PR review comments * Resolve merge conflict * Enable unit tests * Use Dongbo's change in unit test Co-authored-by: Ryan Eberhard <ryan.eberhard@oracle.com>
1 parent 8e58dc1 commit 6e3fd81

29 files changed

+973
-83
lines changed

docs/domains/Domain.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,11 @@
117117
"description": "Customization affecting Kubernetes Service generated for this WebLogic cluster.",
118118
"$ref": "#/definitions/KubernetesResource"
119119
},
120+
"maxConcurrentShutdown": {
121+
"description": "The maximum number of WebLogic Server instances that will shut down in parallel for this cluster when it is being partially shut down by lowering its replica count. A value of 0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`, which defaults to 1.",
122+
"type": "number",
123+
"minimum": 0
124+
},
120125
"serverStartPolicy": {
121126
"description": "The strategy for deciding whether to start a WebLogic Server instance. Legal values are NEVER, or IF_NEEDED. Defaults to IF_NEEDED. More info: https://oracle.github.io/weblogic-kubernetes-operator/userguide/managing-domains/domain-lifecycle/startup/#starting-and-stopping-servers.",
122127
"type": "string",
@@ -354,6 +359,11 @@
354359
"type": "number",
355360
"minimum": 0
356361
},
362+
"maxClusterConcurrentShutdown": {
363+
"description": "The default maximum number of WebLogic Server instances that a cluster will shut down in parallel when it is being partially shut down by lowering its replica count. You can override this default on a per cluster basis by setting the cluster\u0027s `maxConcurrentShutdown` field. A value of 0 means there is no limit. Defaults to 1.",
364+
"type": "number",
365+
"minimum": 0
366+
},
357367
"domainHomeInImage": {
358368
"deprecated": "true",
359369
"description": "Deprecated. Use `domainHomeSourceType` instead. Ignored if `domainHomeSourceType` is specified. True indicates that the domain home file system is present in the container image specified by the image field. False indicates that the domain home file system is located on a persistent volume. Defaults to unset.",

docs/domains/Domain.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ The specification of the operation of the WebLogic domain. Required.
3434
| `logHome` | string | The directory in a server's container in which to store the domain, Node Manager, server logs, server *.out, introspector .out, and optionally HTTP access log files if `httpAccessLogInLogHome` is true. Ignored if `logHomeEnabled` is false. |
3535
| `logHomeEnabled` | Boolean | Specifies whether the log home folder is enabled. Defaults to true if `domainHomeSourceType` is PersistentVolume; false, otherwise. |
3636
| `managedServers` | array of [Managed Server](#managed-server) | Lifecycle options for individual Managed Servers, including Java options, environment variables, additional Pod content, and the ability to explicitly start, stop, or restart a named server instance. The `serverName` field of each entry must match a Managed Server that already exists in the WebLogic domain configuration or that matches a dynamic cluster member based on the server template. |
37+
| `maxClusterConcurrentShutdown` | number | The default maximum number of WebLogic Server instances that a cluster will shut down in parallel when it is being partially shut down by lowering its replica count. You can override this default on a per cluster basis by setting the cluster's `maxConcurrentShutdown` field. A value of 0 means there is no limit. Defaults to 1. |
3738
| `maxClusterConcurrentStartup` | number | The maximum number of cluster member Managed Server instances that the operator will start in parallel for a given cluster, if `maxConcurrentStartup` is not specified for a specific cluster under the `clusters` field. A value of 0 means there is no configured limit. Defaults to 0. |
3839
| `replicas` | number | The default number of cluster member Managed Server instances to start for each WebLogic cluster in the domain configuration, unless `replicas` is specified for that cluster under the `clusters` field. For each cluster, the operator will sort cluster member Managed Server names from the WebLogic domain configuration by normalizing any numbers in the Managed Server name and then sorting alphabetically. This is done so that server names such as "managed-server10" come after "managed-server9". The operator will then start Managed Servers from the sorted list, up to the `replicas` count, unless specific Managed Servers are specified as starting in their entry under the `managedServers` field. In that case, the specified Managed Servers will be started and then additional cluster members will be started, up to the `replicas` count, by finding further cluster members in the sorted list that are not already started. If cluster members are started because of their entries under `managedServers`, then a cluster may have more cluster members running than its `replicas` count. Defaults to 0. |
3940
| `restartVersion` | string | Changes to this field cause the operator to restart WebLogic Server instances. More info: https://oracle.github.io/weblogic-kubernetes-operator/userguide/managing-domains/domain-lifecycle/startup/#restarting-servers. |
@@ -75,6 +76,7 @@ The current status of the operation of the WebLogic domain. Updated automaticall
7576
| `allowReplicasBelowMinDynClusterSize` | Boolean | Specifies whether the number of running cluster members is allowed to drop below the minimum dynamic cluster size configured in the WebLogic domain configuration. Otherwise, the operator will ensure that the number of running cluster members is not less than the minimum dynamic cluster setting. This setting applies to dynamic clusters only. Defaults to true. |
7677
| `clusterName` | string | The name of the cluster. This value must match the name of a WebLogic cluster already defined in the WebLogic domain configuration. Required. |
7778
| `clusterService` | [Kubernetes Resource](#kubernetes-resource) | Customization affecting Kubernetes Service generated for this WebLogic cluster. |
79+
| `maxConcurrentShutdown` | number | The maximum number of WebLogic Server instances that will shut down in parallel for this cluster when it is being partially shut down by lowering its replica count. A value of 0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`, which defaults to 1. |
7880
| `maxConcurrentStartup` | number | The maximum number of Managed Servers instances that the operator will start in parallel for this cluster in response to a change in the `replicas` count. If more Managed Server instances must be started, the operator will wait until a Managed Server Pod is in the `Ready` state before starting the next Managed Server instance. A value of 0 means all Managed Server instances will start in parallel. Defaults to 0. |
7981
| `maxUnavailable` | number | The maximum number of cluster members that can be temporarily unavailable. Defaults to 1. |
8082
| `replicas` | number | The number of cluster member Managed Server instances to start for this WebLogic cluster. The operator will sort cluster member Managed Server names from the WebLogic domain configuration by normalizing any numbers in the Managed Server name and then sorting alphabetically. This is done so that server names such as "managed-server10" come after "managed-server9". The operator will then start Managed Server instances from the sorted list, up to the `replicas` count, unless specific Managed Servers are specified as starting in their entry under the `managedServers` field. In that case, the specified Managed Server instances will be started and then additional cluster members will be started, up to the `replicas` count, by finding further cluster members in the sorted list that are not already started. If cluster members are started because of their related entries under `managedServers`, then this cluster may have more cluster members running than its `replicas` count. Defaults to 0. |

docs/domains/index.html

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1038,6 +1038,11 @@
10381038
"description": "Customization affecting Kubernetes Service generated for this WebLogic cluster.",
10391039
"$ref": "#/definitions/KubernetesResource"
10401040
},
1041+
"maxConcurrentShutdown": {
1042+
"description": "The maximum number of WebLogic Server instances that will shut down in parallel for this cluster when it is being partially shut down by lowering its replica count. A value of 0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`, which defaults to 1.",
1043+
"type": "number",
1044+
"minimum": 0.0
1045+
},
10411046
"serverStartPolicy": {
10421047
"description": "The strategy for deciding whether to start a WebLogic Server instance. Legal values are NEVER, or IF_NEEDED. Defaults to IF_NEEDED. More info: https://oracle.github.io/weblogic-kubernetes-operator/userguide/managing-domains/domain-lifecycle/startup/#starting-and-stopping-servers.",
10431048
"type": "string",
@@ -1275,6 +1280,11 @@
12751280
"type": "number",
12761281
"minimum": 0.0
12771282
},
1283+
"maxClusterConcurrentShutdown": {
1284+
"description": "The default maximum number of WebLogic Server instances that a cluster will shut down in parallel when it is being partially shut down by lowering its replica count. You can override this default on a per cluster basis by setting the cluster\u0027s `maxConcurrentShutdown` field. A value of 0 means there is no limit. Defaults to 1.",
1285+
"type": "number",
1286+
"minimum": 0.0
1287+
},
12781288
"domainHomeInImage": {
12791289
"deprecated": "true",
12801290
"description": "Deprecated. Use `domainHomeSourceType` instead. Ignored if `domainHomeSourceType` is specified. True indicates that the domain home file system is present in the container image specified by the image field. False indicates that the domain home file system is located on a persistent volume. Defaults to unset.",

kubernetes/crd/domain-crd.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5127,6 +5127,14 @@ spec:
51275127
additionalProperties:
51285128
type: string
51295129
type: object
5130+
maxConcurrentShutdown:
5131+
description: The maximum number of WebLogic Server instances
5132+
that will shut down in parallel for this cluster when it is
5133+
being partially shut down by lowering its replica count. A
5134+
value of 0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`,
5135+
which defaults to 1.
5136+
type: number
5137+
minimum: 0.0
51305138
serverStartPolicy:
51315139
description: 'The strategy for deciding whether to start a WebLogic
51325140
Server instance. Legal values are NEVER, or IF_NEEDED. Defaults
@@ -5200,6 +5208,14 @@ spec:
52005208
a cluster may have more cluster members running than its `replicas`
52015209
count. Defaults to 0.
52025210
minimum: 0.0
5211+
maxClusterConcurrentShutdown:
5212+
type: number
5213+
description: The default maximum number of WebLogic Server instances
5214+
that a cluster will shut down in parallel when it is being partially
5215+
shut down by lowering its replica count. You can override this default
5216+
on a per cluster basis by setting the cluster's `maxConcurrentShutdown`
5217+
field. A value of 0 means there is no limit. Defaults to 1.
5218+
minimum: 0.0
52035219
domainHomeInImage:
52045220
type: boolean
52055221
description: Deprecated. Use `domainHomeSourceType` instead. Ignored

kubernetes/crd/domain-v1beta1-crd.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5114,6 +5114,14 @@ spec:
51145114
additionalProperties:
51155115
type: string
51165116
type: object
5117+
maxConcurrentShutdown:
5118+
description: The maximum number of WebLogic Server instances that
5119+
will shut down in parallel for this cluster when it is being
5120+
partially shut down by lowering its replica count. A value of
5121+
0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`,
5122+
which defaults to 1.
5123+
type: number
5124+
minimum: 0.0
51175125
serverStartPolicy:
51185126
description: 'The strategy for deciding whether to start a WebLogic
51195127
Server instance. Legal values are NEVER, or IF_NEEDED. Defaults
@@ -5185,6 +5193,14 @@ spec:
51855193
then a cluster may have more cluster members running than its `replicas`
51865194
count. Defaults to 0.
51875195
minimum: 0.0
5196+
maxClusterConcurrentShutdown:
5197+
type: number
5198+
description: The default maximum number of WebLogic Server instances
5199+
that a cluster will shut down in parallel when it is being partially
5200+
shut down by lowering its replica count. You can override this default
5201+
on a per cluster basis by setting the cluster's `maxConcurrentShutdown`
5202+
field. A value of 0 means there is no limit. Defaults to 1.
5203+
minimum: 0.0
51885204
domainHomeInImage:
51895205
type: boolean
51905206
description: Deprecated. Use `domainHomeSourceType` instead. Ignored

operator/src/main/java/oracle/kubernetes/operator/KubernetesConstants.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ public interface KubernetesConstants {
2929
boolean DEFAULT_INCLUDE_SERVER_OUT_IN_POD_LOG = true;
3030
boolean DEFAULT_ALLOW_REPLICAS_BELOW_MIN_DYN_CLUSTER_SIZE = true;
3131
int DEFAULT_MAX_CLUSTER_CONCURRENT_START_UP = 0;
32+
int DEFAULT_MAX_CLUSTER_CONCURRENT_SHUTDOWN = 1;
3233

3334
String CONTAINER_NAME = "weblogic-server";
3435

operator/src/main/java/oracle/kubernetes/operator/calls/AsyncRequestStep.java

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ public class AsyncRequestStep<T> extends Step implements RetryStrategyListener {
5050
private final RequestParams requestParams;
5151
private final CallFactory<T> factory;
5252
private final int maxRetryCount;
53+
private final RetryStrategy customRetryStrategy;
5354
private final String fieldSelector;
5455
private final String labelSelector;
5556
private final String resourceVersion;
@@ -78,10 +79,40 @@ public AsyncRequestStep(
7879
String fieldSelector,
7980
String labelSelector,
8081
String resourceVersion) {
82+
this(next, requestParams, factory, null, helper, timeoutSeconds, maxRetryCount,
83+
fieldSelector, labelSelector, resourceVersion);
84+
}
85+
86+
/**
87+
* Construct async step.
88+
*
89+
* @param next Next
90+
* @param requestParams Request parameters
91+
* @param factory Factory
92+
* @param customRetryStrategy Custom retry strategy
93+
* @param helper Client pool
94+
* @param timeoutSeconds Timeout
95+
* @param maxRetryCount Max retry count
96+
* @param fieldSelector Field selector
97+
* @param labelSelector Label selector
98+
* @param resourceVersion Resource version
99+
*/
100+
public AsyncRequestStep(
101+
ResponseStep<T> next,
102+
RequestParams requestParams,
103+
CallFactory<T> factory,
104+
RetryStrategy customRetryStrategy,
105+
ClientPool helper,
106+
int timeoutSeconds,
107+
int maxRetryCount,
108+
String fieldSelector,
109+
String labelSelector,
110+
String resourceVersion) {
81111
super(next);
82112
this.helper = helper;
83113
this.requestParams = requestParams;
84114
this.factory = factory;
115+
this.customRetryStrategy = customRetryStrategy;
85116
this.timeoutSeconds = timeoutSeconds;
86117
this.maxRetryCount = maxRetryCount;
87118
this.fieldSelector = fieldSelector;
@@ -227,6 +258,9 @@ public NextAction apply(Packet packet) {
227258

228259
retry = oldResponse.getSpi(RetryStrategy.class);
229260
}
261+
if ((retry == null) && (customRetryStrategy != null)) {
262+
retry = customRetryStrategy;
263+
}
230264

231265
if (LOGGER.isFinerEnabled()) {
232266
logAsyncRequest();

operator/src/main/java/oracle/kubernetes/operator/helpers/AsyncRequestStepFactory.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,15 @@
55

66
import oracle.kubernetes.operator.calls.CallFactory;
77
import oracle.kubernetes.operator.calls.RequestParams;
8+
import oracle.kubernetes.operator.calls.RetryStrategy;
89
import oracle.kubernetes.operator.work.Step;
910

1011
public interface AsyncRequestStepFactory {
1112
<T> Step createRequestAsync(
1213
ResponseStep<T> next,
1314
RequestParams requestParams,
1415
CallFactory<T> factory,
16+
RetryStrategy retryStrategy,
1517
ClientPool helper,
1618
int timeoutSeconds,
1719
int maxRetryCount,

operator/src/main/java/oracle/kubernetes/operator/helpers/CallBuilder.java

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
import oracle.kubernetes.operator.calls.CallWrapper;
4949
import oracle.kubernetes.operator.calls.CancellableCall;
5050
import oracle.kubernetes.operator.calls.RequestParams;
51+
import oracle.kubernetes.operator.calls.RetryStrategy;
5152
import oracle.kubernetes.operator.calls.SynchronousCallDispatcher;
5253
import oracle.kubernetes.operator.calls.SynchronousCallFactory;
5354
import oracle.kubernetes.operator.work.Step;
@@ -230,6 +231,7 @@ public <T> T execute(
230231
null,
231232
callback));
232233
private String fieldSelector;
234+
private RetryStrategy retryStrategy;
233235

234236
/* Version */
235237
private String labelSelector;
@@ -516,6 +518,11 @@ public CallBuilder withFieldSelector(String fieldSelector) {
516518
return this;
517519
}
518520

521+
public CallBuilder withRetryStrategy(RetryStrategy retryStrategy) {
522+
this.retryStrategy = retryStrategy;
523+
return this;
524+
}
525+
519526
private void tuning(int limit, int timeoutSeconds, int maxRetryCount) {
520527
this.limit = limit;
521528
this.timeoutSeconds = timeoutSeconds;
@@ -1181,7 +1188,8 @@ public Step deletePodAsync(
11811188
V1DeleteOptions deleteOptions,
11821189
ResponseStep<V1Status> responseStep) {
11831190
return createRequestAsync(
1184-
responseStep, new RequestParams("deletePod", namespace, name, deleteOptions, domainUid), deletePod);
1191+
responseStep, new RequestParams("deletePod", namespace, name, deleteOptions, domainUid),
1192+
deletePod, retryStrategy);
11851193
}
11861194

11871195
private Call patchPodAsync(
@@ -1836,6 +1844,7 @@ private <T> Step createRequestAsync(
18361844
next,
18371845
requestParams,
18381846
factory,
1847+
null,
18391848
helper,
18401849
timeoutSeconds,
18411850
maxRetryCount,
@@ -1844,6 +1853,21 @@ private <T> Step createRequestAsync(
18441853
resourceVersion);
18451854
}
18461855

1856+
private <T> Step createRequestAsync(
1857+
ResponseStep<T> next, RequestParams requestParams, CallFactory<T> factory, RetryStrategy retryStrategy) {
1858+
return STEP_FACTORY.createRequestAsync(
1859+
next,
1860+
requestParams,
1861+
factory,
1862+
retryStrategy,
1863+
helper,
1864+
timeoutSeconds,
1865+
maxRetryCount,
1866+
fieldSelector,
1867+
labelSelector,
1868+
resourceVersion);
1869+
}
1870+
18471871
private CancellableCall wrap(Call call) {
18481872
return new CallWrapper(call);
18491873
}

0 commit comments

Comments
 (0)