- System requirements
- Setting up the cluster
- Running the
cmk isolate
Hello World Pod - Validating the environment
- Troubleshooting and recovery
Related:
Kubernetes >= v1.5.0 (excluding v1.8.0, details below)
All of template manifests provided with CMK are using serviceaccount which is
defined in cmk-serviceaccount
manifest. Before first
CMK run, operator should use it to define cmk-serviceaccount
. This step isn't
obligatory on Kubernetes 1.5 but it's strongly recomended. Kubernetes 1.6
requires it because of RBAC authorization method which will use it to deliver
API access from inside of CMK pod(s).
From Kubernetes 1.6 RBAC has became default authorization method.
Operator needs to prepare additional ClusterRole and
ClusterRoleBindings in order to deploy CMK.Those are
provided in cmk-rbac-rules
manifest. In this case operator
must also use provided serviceaccount manifest as well.
From Kubernetes 1.7 Custom Resource Definitions has replaced Third Party Resource.
Only in Kubernetes 1.7 both are compatible. Operator must migrate from TRP to CRD.
To cmk-rbac-rules
manifest ClusterRole and ClusterRoleBindings have been added for CRD.
CMK will detect the version Kubernetes itself and will be use Custom Resource Definitions
if Kubernetes version is 1.7 else Third Party Resource to create Nodereport and Reconcilereport.
Additionally Taints have been moved from alpha to beta and are no logner present in node metadata
but directly in spec
. Please note that if pod manifest has nodeName: <nodename>
selector, taints tolerations are not needed.
Kubernetes 1.8.0 is not supported due to extended resources issue(it's impossible to create extended resource). Use Kubernetes 1.8.1+ instead.
From Kubernetes 1.9.0 mutating admission controller is being used to update any pod which
definition contains any container requesting CMK Extended Resources. CMK webhook modifies
it by injecting environmental variable CMK_NUM_CORES
with its value set to a number of cores
specified in the Extended Resource request. This allows cmk isolate
to assign multiple
CPU cores to given process.
On top of that webhook applies additional changes to the pod which are defined in
the configuration file. By default, configuration deployed during cmk cluster-init
adds
CMK installation and configuration directories and host /proc filesystem volumes, CMK
service account, tolerations required for a pod to be scheduled on the CMK enabled node
and appropriately annotates pod. Containers specifications are updated with volume mounts
(referencing volumes added to the pod) and environmental variable CMK_PROC_FS
.
https://kubernetes.io/docs/admin/authorization/rbac/#rolebinding-and-clusterrolebinding
This section describes the setup required to use the CMK
software.
Notes:
- The recommended way to prepare Kubernetes nodes for the
CMK
software is to runcmk cluster-init
as a Pod as described in cluster setup instructions usingcmk cluster-init
. - The cluster setup instructions using manually created Pods should only be used if and
only if running
cmk cluster-init
fails for some reason.
Prepare the nodes by running cmk cluster-init
using these instructions.
- Concepts
- Preparing nodes by running
cmk cluster-init
(recommended) - Preparing nodes by running each
CMK
subcommand as a Pod (use only if required)
Term | Meaning |
---|---|
CMK nodes |
The operator can choose any number of nodes in the kubernetes cluster to work with CMK . These participating nodes will be referred as CMK nodes. |
Pod | A Pod is an abstraction in Kubernetes to represent one or more containers and their configuration. It is the smallest schedulable unit in Kubernetes. |
OIR | Acronym for Opaque Integer Resource. In Kubernetes, OIR allow cluster operators to advertise new node-level resources that would be otherwise unknown to the system. |
Volume | A volume is a directory (on host file system). In Kubernetes, a volume has the same lifetime as the Pod that uses it. Many types of volumes are supported in Kubernetes. |
hostPath |
hostPath is a volume type in Kubernetes. It mounts a file or directory from the host file system into the Pod. |
CMK
nodes can be prepared by using cmk cluster-init
subcommand. The subcommand is expected to
be run as a pod. The cmk-cluster-init-pod template can be used to run cmk cluster-init
on a
Kubernetes cluster. When run on a Kubernetes cluster, the Pod spawns two Pods per node at most in order to prepare
each node.
The only value that requires change in the cmk-cluster-init-pod template is the args
field,
which can be modified to pass different options.
Following are some example modifications to the args
field:
- args:
# Change this value to pass different options to cluster-init.
- "/cmk/cmk.py cluster-init --host-list=node1,node2,node3"
The above command prepares nodes "node1", "node2" and "node3" for the CMK
software using default options.
- args:
# Change this value to pass different options to cluster-init.
- "/cmk/cmk.py cluster-init --all-hosts"
The above command prepares all the nodes in the Kubernetes cluster for the CMK
software using default options.
- args:
# Change this value to pass different options to cluster-init.
- "/cmk/cmk.py cluster-init --host-list=node1,node2,node3 --cmk-cmd-list=init,discover"
The above command prepares nodes "node1", "node2" and "node3" but only runs the cmk init
and cmk discover
subcommands on each of those nodes.
For more details on the options provided by cmk cluster-init
, see this description.
Notes:
- The instructions provided in this section should only be used if and only if running
cmk cluster-init
fails for some reason. - The subcommands described below should be run in the same order.
- The documentation in this section assumes that the
CMK
configuration directory is/etc/cmk
and thecmk
binary is installed on the host under/opt/bin
. - In all the pod templates used in this section, the name of container image used is
cmk:v1.3.0
. It is expected that thecmk
container image is built and cached locally in the host. Theimage
field will require modification if the container image is hosted remotely (e.g., in https://hub.docker.com/).
The CMK
nodes in the kubernetes cluster should be initialized in order to be used with the CMK software using
cmk-init
. To initialize the CMK
nodes, the cmk-init-pod template can be used.
cmk init
takes the --conf-dir
, --num-exclusive-cores
and the --num-shared-cores
flags. In the
cmk-init-pod template, the values to these flags can be modified. The value for --conf-dir
can be
set by changing the path
value of the hostPath
for the cmk-conf-dir
. The value for --num-exclusive-cores
and
--num-shared-cores
can be set by changing the values for the NUM_EXCLUSIVE_CORES
and NUM_SHARED_CORES
environment variables,
respectively.
Values that might require modification in the cmk-init-pod template are shown as snippets below:
volumes:
- hostPath:
# Change this to modify the CMK config dir in the host file system.
path: "/etc/cmk"
name: cmk-conf-dir
env:
- name: NUM_EXCLUSIVE_CORES
# Change this to modify the value passed to `--num-exclusive-cores` flag.
value: '4'
- name: NUM_SHARED_CORES
# Change this to modify the value passed to `--num-shared-cores` flag.
value: '1'
All the CMK
nodes in the Kubernetes cluster should be patched with CMK
OIR slots using
cmk discover
. The OIR slots are advertised as the exclusive pools need to be allocated exclusively.
The number of slots advertised should be equal to the number of cpu lists under the exclusive pool, as determined
by examining the CMK
configuration directory. cmk-discover-pod template can be used to
advertise the CMK
OIR slots.
cmk discover
takes the --conf-dir
flag. In the cmk-discover-pod template, the value for
--conf-dir
can be configured by changing the path
value of the hostPath
for cmk-conf-dir
. After running
this Pod in a node, the node will be patched with `pod.alpha.kubernetes.io/opaque-int-resource-cmk' OIR.
Values that might require modification in the cmk-discover-pod template are shown as snippets below:
volumes:
- hostPath:
# Change this to modify the CMK config dir in the host file system.
path: "/etc/cmk"
name: cmk-conf-dir
In order to reconcile from an outdated CMK
configuration state, each CMK
node should run
cmk reconcile
periodically. cmk reconcile
can be run periodically using the
cmk-reconcile-daemonset template.
In the cmk-reconcile-daemonset template, the time between each invocation of cmk reconcile
can be adjusted by changing the value of the CMK_RECONCILE_SLEEP_TIME environment variable. The value specifies time
in seconds. cmk reconcile
takes the --conf-dir
flag. This value can be configured by changing the path
value of the hostPath
for the cmk-conf-dir
in the cmk-reconcile-daemonset template.
Values that might require modification in the cmk-reconcile-daemonset template are shown as snippets below:
env:
- name: CMK_RECONCILE_SLEEP_TIME
# Change this to modify the sleep interval between consecutive
# cmk reconcile runs. The value is specified in seconds.
value: '60'
volumes:
- hostPath:
# Change this to modify the CMK config dir in the host file system.
path: "/etc/cmk"
name: cmk-conf-dir
cmk install
is used to create a zero-dependency binary of the CMK
software and place it on the host
filesystem. Subsequent containers can isolate themselves by mounting the install directory from the host and then
calling cmk isolate
. To run it on all the CMK
nodes, the cmk-install-pod template
can be used.
cmk install
takes the --install-dir
flag. In the cmk-install-pod template, the value for
--install-dir
can be configured by changing the path
value of the hostPath
for the cmk-install-dir
.
Values that might require modification in the cmk-install-pod template are shown as snippets below:
volumes:
- hostPath:
# Change this to modify the CMK installation dir in the host file system.
path: "/opt/bin"
name: cmk-install-dir
cmk webhook
is used to run mutating admission webhook server. Whenever there's a requestto create a new pod,
the webhook can capture that request, check whether any of the containers requests or limits number of the CMK
Extended Resources and update pod and its container specification appropriately. This allows to simplify deployment
of workloads taking advantage of CMK, by reducing the number of requirements to the minimum.
...
spec:
containers:
resources:
requests:
cmk.intel.com/exclusive-cores: 2
...
In order to deploy CMK mutating webhook a number of resources needs to be created on the cluster. But even before that, operator needs to have X509 private key and TLS certificate in PEM format generated. Certificates can be self-signed, although using ceritificates signed by proper CA or Kubernetes Certificates API is highly recommended. After meeting that requirement, steps to deploy webhook are as follows:
- Certificates in PEM format should be then encoded to Base64 format and placed in the Mutating Admission Configuration and Secret templates.
- Update config map template. Config map contains 2
configuration files
server.yaml
andmutations.yaml
. Configuration options are described in the cmk command-line tool documentation. - Create secret, service and
config map using
kubectl create -f ...
command. - Run
cmk webhook
pod defined in the webhook pod template usingkubectl create -f ...
command. - If the
cmk webhook
pod is running correctly, create Mutating Admission Configuration object.
CMK
is able to use multiple sockets. During cluster initialization, init
module will distribute cores from all sockets
across pools. To prevent a situation, where exclusive pool or shared pool are spawned only on a single socket
operator is able to use one of two mode
policies: packed
and spread
. Those policies define how cores are assigned to
specific pool:
- packed mode will put cores in the following order:
Note: This policy is not topology aware, so there is a possibility that one pool won't spread on multiple sockets.
- spread mode will put following cores order:
Note: This policy is topology aware, so CMK will try to spread pools on each socket.
To select appropriate mode
operator can select it during initialization with --shared-mode
or --exclusive-mode
parameters.
Those parameters can be used with cluster-init
and init
. If operator use two different modes, then those policies
will be mixed. In that case exclusive pool is resolving before shared pool.
After following the instructions in the previous section, the cluster is ready to run the Hello World
Pod. The Hello
World cmk-isolate-pod template describes a simple Pod with three containers requesting CPUs from
the exclusive, shared and the infra pools, respectively, using cmk isolate
. The
pool
is requested by passing the desired value to the --pool
flag when using cmk isolate
as described in the
documentation.
cmk isolate
can use --socket-id
flag to target on which socket application should be spawned. This flag is optional,
suitable only for exclusive pool and if it's not used cmk isolate
will use first not reserved core.
cmk isolate
also takes the --conf-dir
and --install-dir
flags. In the cmk-isolate-pod template,
the values for --conf-dir
and --install-dir
can be modified by changing the path
values of the hostPath
.
Values that might require modification in the cmk-isolate-pod template are shown as snippets below:
volumes:
- hostPath:
# Change this to modify the CMK installation dir in the host file system.
path: "/opt/bin"
name: cmk-install-dir
- hostPath:
# Change this to modify the CMK config dir in the host file system.
path: "/etc/cmk"
name: cmk-conf-dir
Notes:
- The Hello World cmk-isolate-pod consumes the
pod.alpha.kubernetes.io/opaque-int-resource-cmk
Opaque Integer Resource (OIR) only in the container isolated using the exclusive pool. TheCMK
software assumes that only container isolated using the exclusive pool requests the OIR and each of these containers should consume exactly one OIR. This restricts the number of pods that can land on a Kubernetes node to the expected value. - The
cmk isolate
Hello World Pod should only be run after following the instructions provided in theSetting up the cluster
section.
Following is an example to validate the environment in one node.
- Pick a node to test. For illustration, we will use
<node-name>
as the name of the node. - Check if node has appropriate label.
kubectl get node <node-name> -o json | jq .metadata.labels
Example output:
kubectl get node cmk-02-zzwt7w -o json | jq .metadata.labels
{
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"cmk.intel.com/cmk-node": "true",
"kubernetes.io/hostname": "cmk-02-zzwt7w"
}
- Check if node has appropriate taint. (kubernetes < v1.7)
kubectl get node <node-name> -o json | jq .metadata.annotations
Example output:
kubectl get node cmk-02-zzwt7w -o json | jq .metadata.annotations
{
"scheduler.alpha.kubernetes.io/taints": "[{\"value\": \"true\", \"key\": \"cmk\", \"effect\": \"NoSchedule\"}]",
"volumes.kubernetes.io/controller-managed-attach-detach": "true"
}
- Check if node has appropriate taint. (kubernetes >= v1.7)
kubectl get node <node-name> -o json | jq .spec.taints
Example output:
kubectl get node cmk-02-zzwt7w -o json | jq .spec.taints
[
{
"effect": "NoSchedule",
"key": "cmk",
"timeAdded": null,
"value": "true"
}
]
- Check if node has the appropriate OIR. (kubernetes < v1.8)
kubectl get node <node-name> -o json | jq .status.capacity
Example output:
kubectl get node cmk-02-zzwt7w -o json | jq .status.capacity
{
"alpha.kubernetes.io/nvidia-gpu": "0",
"cpu": "16",
"memory": "14778328Ki",
"pod.alpha.kubernetes.io/opaque-int-resource-cmk": "4",
"pods": "110"
}
- Check if node has the appropriate ER. (kubernetes >= v1.8.1)
kubectl get node <node-name> -o json | jq .status.capacity
Example output:
kubectl get node cmk-02-zzwt7w -o json | jq .status.capacity
{
"alpha.kubernetes.io/nvidia-gpu": "0",
"cpu": "16",
"memory": "14778328Ki",
"cmk.intel.com/exclusive-cores": "4",
"pods": "110"
}
- Login to the node and check if
CMK
configuration directory and binary exisits. Assuming default options were used forcmk cluster-init
, you would do the following:
ls /etc/cmk/
ls /opt/bin/
- Replace the
nodeName
in the Pod manifest below to the chosen node name and save it to a file.
apiVersion: v1
kind: Pod
metadata:
labels:
app: cmk-isolate-pod
name: cmk-isolate-pod
spec:
# Change this to the <node-name> you want to test.
nodeName: NODENAME
containers:
- args:
- "/opt/bin/cmk isolate --conf-dir=/etc/cmk --pool=infra sleep -- 10000"
command:
- "/bin/bash"
- "-c"
env:
- name: CMK_PROC_FS
value: "/host/proc"
image: cmk:v1.3.0
imagePullPolicy: "Never"
name: cmk-isolate-infra
volumeMounts:
- mountPath: "/host/proc"
name: host-proc
readOnly: true
- mountPath: "/opt/bin"
name: cmk-install-dir
- mountPath: "/etc/cmk"
name: cmk-conf-dir
restartPolicy: Never
volumes:
- hostPath:
# Change this to modify the CMK installation dir in the host file system.
path: "/opt/bin"
name: cmk-install-dir
- hostPath:
path: "/proc"
name: host-proc
- hostPath:
# Change this to modify the CMK config dir in the host file system.
path: "/etc/cmk"
name: cmk-conf-dir
- Run
kubectl create -f <file-name>
, where<file-name>
is name of the Pod manifest file withnodeName
field substituted as mentioned in the previous step. - Check if any process is isolated in the
infra
pool usingNodeReport
for that node. If you using third part resources (kubernetes 1.6.x and older versions)kubectl get NodeReport <node-name> -o json | jq .report.description.pools.infra
If you using custom resources definition (kubernetes 1.7.x and newer versions)kubectl get cmk-nodereport <node-name> -o json | jq .spec.report.description.pools.infra
- Follow all the above steps, but use simplified Pod manifest:
apiVersion: v1
kind: Pod
metadata:
labels:
app: cmk-isolate-pod
name: cmk-isolate-pod
spec:
# Change this to the <node-name> you want to test.
nodeName: NODENAME
containers:
- args:
- "/opt/bin/cmk isolate --conf-dir=/etc/cmk --pool=exclusive sleep -- 10000"
command:
- "/bin/bash"
- "-c"
env:
image: cmk:v1.3.0
imagePullPolicy: "Never"
name: cmk-isolate-infra
resources:
requests:
cmk.intel.com/exclusive-cores: 1
restartPolicy: Never
- Run
kubectl create -f <file-name>
, where<file-name>
is the name of the Pod manifest file with nodeName field substituted as mentioned in the previous section. - Run
kubectl get pod cmk-isolate-pod -o json | jq .metadata.annotations
and verify that annotation has been added:
{
"cmk.intel.com/resources-injected": "true"
}
- Run
kubectl get pod cmk-isolate-pod -o json | jq .spec.volumes
and verify that extra volumes have been injected:
[
{
"name": "default-token-xfd8q",
"secret": {
"defaultMode": 420,
"secretName": "default-token-xfd8q"
}
},
{
"hostPath": {
"path": "/proc",
"type": ""
},
"name": "cmk-host-proc"
},
{
"hostPath": {
"path": "/etc/cmk",
"type": ""
},
"name": "cmk-config-dir"
},
{
"hostPath": {
"path": "/opt/bin",
"type": ""
},
"name": "cmk-install-dir"
}
]
- Run
kubectl get pod cmk-isolate-pod -o json | jq .spec.containers[0].env
and verify that env variables have been added to the container spec:
[
{
"name": "CMK_PROC_FS",
"value": "/host/proc"
},
{
"name": "CMK_NUM_CORES",
"value": "1"
}
]
If running cmk cluster-init
using the cmk-cluster-init-pod template ends up in an error,
the recommended way to start troubleshooting is to look at the logs using kubectl logs POD_NAME [CONTAINER_NAME] -f
.
For example, assuming you ran the cmk-cluster-init-pod template with default options, it
should create two pods on each node named cmk-init-install-discover-pod-<node-name>
and
cmk-reconcile-nodereport-<node-name>
, where <node-name>
should be replaced with the name of the node.
If you want to look at the logs from the container which ran the discover
subcommand in the pod, you can use
kubectl logs -f cmk-init-install-discover-pod-<node-name> discover
If you want to look at the logs from the container which ran the reconcile
subcommand in the pod, you can use
kubectl logs -f cmk-reconcile-nodereport-pod-<node-name> reconcile
If you want to remove cmk
use cmk-uninstall-pod.yaml
. nodeSelector
can help to fine-grain the deletion for specific node.