Skip to content

Commit ede68e1

Browse files
committed
Changes for OWLS-80384 - Verify that operator deployment and WebLogic pods have good default cpu/memory resources
1 parent 29f22ff commit ede68e1

File tree

2 files changed

+177
-0
lines changed

2 files changed

+177
-0
lines changed
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Considerations for Pod Resource (Memory and CPU) Requests and Limits
2+
The operator creates a pod for each running WebLogic Server instance and each pod will have a container. It.s important that containers have enough resources in order for applications to run efficiently and expeditiously.
3+
4+
If a pod is scheduled on a node with limited resources, it.s possible for the node to run out of memory or CPU resources, and for applications to stop working properly or have degraded performance. It.s also possible for a rouge application to use all available memory and/or CPU, which makes other containers running on the same system unresponsive. The same problem can happen if an application has memory leak or bad configuration.
5+
6+
A pod.s resource requests and limit parameters can be used to solve these problems. Setting resource limits prevents an application from using more than it.s share of resource. Thus, limiting resources improves reliability and stability of applications. It also allows users to plan for the hardware capacity. Additionally, pod.s priority and the Quality of Service (QoS) that pod receives is affected by whether resource requests and limits are specified or not.
7+
8+
## Pod Quality Of Service (QoS) and Prioritization
9+
Pod.s Quality of Service (QoS) and priority is determined based on whether pod.s resource requests and limits are configured or not and how they.re configured.
10+
11+
Best Effort QoS: If you don.t configure requests and limits, pod receives .best-effort. QoS and pod has the lowest priority. In cases where node runs out of non-shareable resources, kubelet.s out-of-resource eviction policy evicts/kills the pods with best-effort QoS first.
12+
13+
Burstable QoS: If you configure both resource requests and limits, and set the requests to be less than the limit, pod.s QoS will be .Burstable.. Similarly when you only configure the resource requests (without limits), the pod QoS is .Burstable.. When the node runs out of non-shareable resources, kubelet will kill .Burstable. Pods only when there are no more .best-effort. pods running. The Burstable pod receives medium priority.
14+
15+
Guaranteed QoS: If you set the requests and the limits to equal values, pod will have .Guranteed. QoS and pod will be considered as of the top most priority. These settings indicates that your pod will consume a fixed amount of memory and CPU. With this configuration, if a node runs out of shareable resources, Kubernetes will kill the best-effort and the burstable Pods first before terminating these Guaranteed QoS Pods. These are the highest priority Pods.
16+
17+
## Java heap size and pod memory request/limit considerations
18+
It.s extremely important to set correct heap size for JVM-based applications. If available memory on node or memory allocated to container is not sufficient for specified JVM heap arguments (and additional off-heap memory), it is possible for WL process to run out of memory. In order to avoid this, you will need to make sure that configured heap sizes are not too big and that the pod is scheduled on the node with sufficient memory.
19+
With the latest Java version, it.s possible to rely on the default JVM heap settings which are safe but quite conservative. If you configure the memory limit for a container but don.t configure heap sizes (-Xms and -Xmx), JVM will configure max heap size to 25% (1/4th) of container memory limit by default. The minimum heap size is configured to 1.56% (1/64th) of limit value.
20+
21+
### Default heap size and resource request values for sample WebLogic Server Pods:
22+
The samples configure default min and max heap size for WebLogic server java process to 256MB and 512MB respectively. This can be changed using USER_MEM_ARGS environment variable. The default min and max heap size for node-manager process is 64MB and 100MB. This can be changed by using NODEMGR_MEM_ARGS environment variable.
23+
24+
The default memory request in samples for WebLogic server pod is 768MB and default CPU request is 250m. This can be changed during domain creation in resources section.
25+
26+
There.s no memory or CPU limit configured by default in samples and default QoS for WebLogic server pod is Burstable. If your use-case and workload requires higher QoS and priority, this can be achieved by setting memory and CPU limits. You.ll need to run tests and experiment with different memory/CPU limits to determine optimal limit values.
27+
28+
### Configure min/max heap size in percentages using "-XX:MinRAMPercentage" and "-XX:MaxRAMPercentage"
29+
If you specify pod memory limit, it's recommended to configure heap size as a percentage of the total RAM (memory) specified in the pod memory limit. These parameters allow you to fine-tune the heap size . the meaning of those settings is explained in this excellent answer on StackOverflow. Please note . they set the percentage, not the fixed values. Thanks to it changing container memory settings will not break anything.
30+
When configuring memory limits, it.s important to make sure that the limit is sufficiently big to accommodate the configured heap (and off-heap) requirements, but it's not too big to waste memory resource. Since pod memory will never go above the limit, if JVM's memory usage (sum of heap and native memory) goes above the limit, JVM process will be killed due to out-of-memory error and WebLogic container will be restarted due to liveness probe failure. Additionally there's also a node-manager process that.s running in same container and it has it's own heap and off-heap requirements. You can also fine tune the node manager heap size in percentages by setting "-XX:MinRAMPercentage" and "-XX:MaxRAMPercentage" using .NODEMGR_JAVA_OPTIONS. environment variable.
31+
32+
### Using "-Xms" and "-Xmx" parameters when not configuring limits
33+
In some cases, it.s difficult to come up with a hard limit for the container and you might only want to configure memory requests but not configure memory limits. In such scenarios, you can use traditional approach to set min/max heap size using .-Xms. and .-Xmx..
34+
35+
### CPU requests and limits
36+
It.s important that the containers running WebLogic applications have enough CPU resources, otherwise applications performance can suffer. You also don't want to set CPU requests and limit too high if your application don't need or use it. Since CPU is a shared resource, if the amount of CPU that you reserve is more than required by your application, the CPU cycles will go unused and be wasted. If no CPU request and limit is configured, it can end up using all CPU resources available on node. This can starve other containers from using shareable CPU cycles.
37+
38+
One other thing to keep in mind is that if pod CPU limit is not configured, it might lead to incorrect garbage collection (GC) strategy selection. WebLogic self-tuning work-manager uses pod CPU limit to configure the number of threads in a default thread pool. If you don.t specify container CPU limit, the performance might be affected due to incorrect number of GC threads or wrong WebLogic server thread pool size.
39+
40+
## Beware of setting resource limits too high
41+
It.s important to keep in mind that if you set a value of CPU core count that.s larger than core count of the biggest node, then the pod will never be scheduled. Let.s say you have a pod that needs 4 cores but you have a kubernetes cluster that.s comprised of 2 core VMs. In this case, your pod will never be scheduled. WebLogic applications are normally designed to take advantage of multiple cores and should be given CPU requests as such. CPUs are considered as a compressible resource. If your apps are hitting CPU limits, kubernetes will start to throttle your container. This means your CPU will be artificially restricted, giving your app potentially worse performance. However it won.t be terminated or evicted.
42+
Just like CPU, if you put a memory request that.s larger than amount of memory on your nodes, the pod will never be scheduled.
43+
## CPU Affinity and lock contention in k8s
44+
We observed much higher lock contention in k8s env when running some workloads in kubernetes as compared to traditional env. The lock contention seem to be caused by the lack of CPU cache affinity and/or scheduling latency when the workload moves to different CPU cores.
45+
46+
In traditional (non-k8s) environment, often tests are run with CPU affinity by binding WLS java process to particular CPU core(s) (using taskset command). This results in reduced lock contention and better performance.
47+
48+
In k8s environment. when CPU manager policy is configured to be "static" and QOS is "Guaranteed" for WLS pods, we see reduced lock contention and better performance. The default CPU manager policy is "none" (default). Please refer to controlling CPU management policies for more details.
49+
50+
## References:
51+
1) https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits
52+
2) https://blog.softwaremill.com/docker-support-in-new-java-8-finally-fd595df0ca54
53+
3) https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
54+
4) https://www.magalix.com/blog/kubernetes-patterns-capacity-planning
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Copyright (c) 2017, 2020, Oracle Corporation and/or its affiliates.
2+
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.
3+
#
4+
# This is an example of how to define a Domain resource.
5+
#
6+
apiVersion: "weblogic.oracle/v7"
7+
kind: Domain
8+
metadata:
9+
name: %DOMAIN_UID%
10+
namespace: %NAMESPACE%
11+
labels:
12+
weblogic.resourceVersion: domain-v2
13+
weblogic.domainUID: %DOMAIN_UID%
14+
spec:
15+
# The WebLogic Domain Home
16+
domainHome: %DOMAIN_HOME%
17+
18+
# The domain home source type
19+
# Set to PersistentVolume for domain-in-pv, Image for domain-in-image, or FromModel for model-in-image
20+
domainHomeSourceType: %DOMAIN_HOME_SOURCE_TYPE%
21+
22+
# The WebLogic Server Docker image that the Operator uses to start the domain
23+
image: "%WEBLOGIC_IMAGE%"
24+
25+
# imagePullPolicy defaults to "Always" if image version is :latest
26+
imagePullPolicy: "%WEBLOGIC_IMAGE_PULL_POLICY%"
27+
28+
# Identify which Secret contains the credentials for pulling an image
29+
%WEBLOGIC_IMAGE_PULL_SECRET_PREFIX%imagePullSecrets:
30+
%WEBLOGIC_IMAGE_PULL_SECRET_PREFIX%- name: %WEBLOGIC_IMAGE_PULL_SECRET_NAME%
31+
32+
# Identify which Secret contains the WebLogic Admin credentials (note that there is an example of
33+
# how to create that Secret at the end of this file)
34+
webLogicCredentialsSecret:
35+
name: %WEBLOGIC_CREDENTIALS_SECRET_NAME%
36+
37+
# Whether to include the server out file into the pod's stdout, default is true
38+
includeServerOutInPodLog: %INCLUDE_SERVER_OUT_IN_POD_LOG%
39+
40+
# Whether to enable log home
41+
%LOG_HOME_ON_PV_PREFIX%logHomeEnabled: %LOG_HOME_ENABLED%
42+
43+
# Whether to write HTTP access log file to log home
44+
%LOG_HOME_ON_PV_PREFIX%httpAccessLogInLogHome: %HTTP_ACCESS_LOG_IN_LOG_HOME%
45+
46+
# The in-pod location for domain log, server logs, server out, and Node Manager log files
47+
%LOG_HOME_ON_PV_PREFIX%logHome: %LOG_HOME%
48+
# An (optional) in-pod location for data storage of default and custom file stores.
49+
# If not specified or the value is either not set or empty (e.g. dataHome: "") then the
50+
# data storage directories are determined from the WebLogic domain home configuration.
51+
dataHome: "%DATA_HOME%"
52+
53+
# Istio service mesh support is experimental.
54+
%ISTIO_PREFIX%experimental:
55+
%ISTIO_PREFIX% istio:
56+
%ISTIO_PREFIX% enabled: %ISTIO_ENABLED%
57+
%ISTIO_PREFIX% readinessPort: %ISTIO_READINESS_PORT%
58+
59+
# serverStartPolicy legal values are "NEVER", "IF_NEEDED", or "ADMIN_ONLY"
60+
# This determines which WebLogic Servers the Operator will start up when it discovers this Domain
61+
# - "NEVER" will not start any server in the domain
62+
# - "ADMIN_ONLY" will start up only the administration server (no managed servers will be started)
63+
# - "IF_NEEDED" will start all non-clustered servers, including the administration server and clustered servers up to the replica count
64+
serverStartPolicy: "%SERVER_START_POLICY%"
65+
66+
serverPod:
67+
# an (optional) list of environment variable to be set on the servers
68+
env:
69+
- name: JAVA_OPTIONS
70+
value: "%JAVA_OPTIONS%"
71+
- name: USER_MEM_ARGS
72+
value: "-Djava.security.egd=file:/dev/./urandom -Xms256m -Xmx1024m "
73+
%OPTIONAL_SERVERPOD_RESOURCES%
74+
%LOG_HOME_ON_PV_PREFIX%volumes:
75+
%LOG_HOME_ON_PV_PREFIX%- name: weblogic-domain-storage-volume
76+
%LOG_HOME_ON_PV_PREFIX% persistentVolumeClaim:
77+
%LOG_HOME_ON_PV_PREFIX% claimName: %DOMAIN_PVC_NAME%
78+
%LOG_HOME_ON_PV_PREFIX%volumeMounts:
79+
%LOG_HOME_ON_PV_PREFIX%- mountPath: %DOMAIN_ROOT_DIR%
80+
%LOG_HOME_ON_PV_PREFIX% name: weblogic-domain-storage-volume
81+
82+
# adminServer is used to configure the desired behavior for starting the administration server.
83+
adminServer:
84+
# serverStartState legal values are "RUNNING" or "ADMIN"
85+
# "RUNNING" means the listed server will be started up to "RUNNING" mode
86+
# "ADMIN" means the listed server will be start up to "ADMIN" mode
87+
serverStartState: "RUNNING"
88+
%EXPOSE_ANY_CHANNEL_PREFIX%adminService:
89+
%EXPOSE_ANY_CHANNEL_PREFIX% channels:
90+
# The Admin Server's NodePort
91+
%EXPOSE_ADMIN_PORT_PREFIX% - channelName: default
92+
%EXPOSE_ADMIN_PORT_PREFIX% nodePort: %ADMIN_NODE_PORT%
93+
# Uncomment to export the T3Channel as a service
94+
%EXPOSE_T3_CHANNEL_PREFIX% - channelName: T3Channel
95+
serverPod:
96+
# an (optional) list of environment variable to be set on the admin servers
97+
env:
98+
- name: USER_MEM_ARGS
99+
value: "-Djava.security.egd=file:/dev/./urandom -Xms512m -Xmx1024m "
100+
101+
# clusters is used to configure the desired behavior for starting member servers of a cluster.
102+
# If you use this entry, then the rules will be applied to ALL servers that are members of the named clusters.
103+
clusters:
104+
- clusterName: %CLUSTER_NAME%
105+
serverStartState: "RUNNING"
106+
serverPod:
107+
# Instructs Kubernetes scheduler to prefer nodes for new cluster members where there are not
108+
# already members of the same cluster.
109+
affinity:
110+
podAntiAffinity:
111+
preferredDuringSchedulingIgnoredDuringExecution:
112+
- weight: 100
113+
podAffinityTerm:
114+
labelSelector:
115+
matchExpressions:
116+
- key: "weblogic.clusterName"
117+
operator: In
118+
values:
119+
- $(CLUSTER_NAME)
120+
topologyKey: "kubernetes.io/hostname"
121+
replicas: %INITIAL_MANAGED_SERVER_REPLICAS%
122+
# The number of managed servers to start for unlisted clusters
123+
# replicas: 1

0 commit comments

Comments
 (0)