Skip to content

Commit 06b8b47

Browse files
committed
OSDOCS-14894: Adding docs about cohorts
1 parent cf7ad1c commit 06b8b47

File tree

10 files changed

+257
-0
lines changed

10 files changed

+257
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,19 @@ Distros: openshift-kueue
5353
Topics:
5454
- Name: Configuring quotas
5555
File: configuring-quotas
56+
- Name: Using cohorts
57+
File: using-cohorts
5658
---
5759
Name: Develop
5860
Dir: develop
5961
Distros: openshift-kueue
6062
Topics:
6163
- Name: Running jobs with quota limits
6264
File: running-kueue-jobs
65+
---
66+
Name: Tutorials
67+
Dir: tutorials
68+
Distros: openshift-kueue
69+
Topics:
70+
- Name: Using cohorts to enable team resource sharing in a cluster
71+
File: tutorials-team-resources

configure/using-cohorts.adoc

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
include::_attributes/common-attributes.adoc[]
3+
[id="using-cohorts"]
4+
= Using cohorts
5+
:context: using-cohorts
6+
7+
toc::[]
8+
9+
You can use cohorts to group cluster queues and determine which cluster queues are able to share borrowable resources with each other.
10+
Borrowable resources are defined as the unused nominal quota of all the cluster queues in a cohort.
11+
12+
Using cohorts can help to optimize resource utilization by preventing under-utilization and enabling fair sharing configurations.
13+
Cohorts can also help to simplify resource management and allocation between teams, since you can group cluster queues for related workloads or for each team.
14+
You can also use cohorts to set resource quotas at a group level to define the limits for resources that a group of cluster queues can consume.
15+
16+
include::modules/configuring-cohorts.adoc[leveloffset=+1]
17+
18+
[role="_additional-resources"]
19+
[id="additional-resources_{context}"]
20+
== Additional resources
21+
22+
* xref:../tutorials/tutorials-team-resources.adoc#tutorials-team-resources[Using cohorts to enable team resource sharing in a cluster]
23+
24+
// future advanced use cases - hierarchical cohorts?

modules/configuring-cohorts.adoc

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * configure/using-cohorts.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="configuring-cohorts_{context}"]
7+
= Configuring cohorts
8+
9+
A cohort is a group of cluster queues, defined by a `Cohort` object, that can share borrowable resources with each other.
10+
11+
In the following example configuration, cluster queues A and B are included in the `example` cohort:
12+
13+
.Example `ClusterQueue` object A
14+
[source,yaml]
15+
----
16+
apiVersion: kueue.openshift.io/v1
17+
kind: ClusterQueue
18+
metadata:
19+
name: queue-a
20+
spec:
21+
cohort: example
22+
resourceQuota:
23+
static:
24+
cpu: "10"
25+
memory: "20Gi"
26+
----
27+
28+
.Example `ClusterQueue` object B
29+
[source,yaml]
30+
----
31+
apiVersion: kueue.openshift.io/v1
32+
kind: ClusterQueue
33+
metadata:
34+
name: queue-b
35+
spec:
36+
cohort: example
37+
resourceQuota: # <1>
38+
static:
39+
cpu: "15"
40+
memory: "30Gi"
41+
resourceGroups: # <2>
42+
- coveredResources: ["cpu"]
43+
flavors:
44+
- name: "default-flavor"
45+
resources:
46+
- name: "cpu"
47+
nominalQuota: 0
48+
----
49+
<1> Defines static resource quotas for the queue, allowing 15 CPU cores and 30 GiB of memory.
50+
<2> Specifies which resources are covered (like CPU), and allow for defining flavors (types or versions of those resources) with individual nominal quotas. This enables detailed control over resource allocation and borrowing from cohorts. For example, a `resourceGroup` can cover CPU and define a `default-flavor` with a nominal CPU quota of 0, indicating the queue might need CPU in the future, even if not currently.
51+
52+
.Example `Cohort` object
53+
[source,yaml]
54+
----
55+
apiVersion: kueue.openshift.io/v1
56+
kind: Cohort
57+
metadata:
58+
name: example
59+
spec: {}
60+
----
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * tutorials/tutorials-team-resources.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="tutorial-creating-a-cohort_{context}"]
7+
= Defining a team cohort
8+
9+
You must define a cohort that encompasses the resources for both teams.
10+
11+
.Procedure
12+
13+
. Define the `Cohort` object as a YAML file named `cohort-engineering-dept.yaml`:
14+
+
15+
.Example cohort YAML file
16+
[source,yaml]
17+
----
18+
apiVersion: kueue.x-k8s.io/v1beta1
19+
kind: Cohort
20+
metadata:
21+
name: engineering-dept
22+
spec:
23+
resourceQuota: # <1>
24+
static:
25+
cpu: 100 # <2>
26+
memory: 200Gi # <3>
27+
nvidia.com/gpu: 10 # <4>
28+
----
29+
<1> Defines the overall static resource quota for the cohort.
30+
<2> Allocates 100 CPU cores to the cohort.
31+
<3> Allocates 200 GiB of memory to the cohort.
32+
<4> Allocates 10 NVIDIA GPUs to the cohort.
33+
34+
. Apply the YAML file:
35+
+
36+
[source,terminal]
37+
----
38+
$ oc apply -f cohort-engineering-dept.yaml
39+
----
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * tutorials/tutorials-team-resources.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="tutorial-define-clusterqueues"_{context}"]
7+
= Define team cluster queues
8+
9+
You must define a cluster queue for each team that you want to be part of the cohort.
10+
For this tutorial, you define two cluster queues; one for team A, and one for team B. You must associate these cluster queues with the `engineering-dept` cohort by configuring the `cohort` field.
11+
12+
.Procedure
13+
14+
. Define a `ClusterQueue` object for team A as a YAML file named `clusterqueue-team-a.yaml`:
15+
+
16+
.Example cluster queue YAML file for team A
17+
[source,yaml]
18+
----
19+
apiVersion: kueue.x-k8s.io/v1beta1
20+
kind: ClusterQueue
21+
metadata:
22+
name: team-a-queue
23+
spec:
24+
cohort: engineering-dept # <1>
25+
resourceQuota: # <2>
26+
static:
27+
"cpu": "40" # <3>
28+
"memory": "80Gi" # <4>
29+
resourceGroups: # <5>
30+
- coveredResources: ["cpu"]
31+
flavors:
32+
- name: "standard-cpu"
33+
resources:
34+
- name: "cpu"
35+
nominalQuota: 10 # <6>
36+
- coveredResources: ["memory"]
37+
flavors:
38+
- name: "standard-memory"
39+
resources:
40+
- name: "memory"
41+
nominalQuota: "20Gi"
42+
----
43+
<1> Associates the cluster queue with the cohort.
44+
<2> Defines the static resource quota for team A.
45+
<3> Allocates 40 CPU cores for the queue.
46+
<4> Allocates 80 GiB of memory for the queue.
47+
<5> Defines resource flavors and nominal quotas.
48+
<6> Sets the initial quota to signal intent to share borrowable resources, even if sharing is not currently needed.
49+
50+
. Apply the cluster queue by running the following command:
51+
+
52+
[source,terminal]
53+
----
54+
$ oc apply -f clusterqueue-team-a.yaml
55+
----
56+
57+
. Define a `ClusterQueue` object for team B as a YAML file named `clusterqueue-team-b.yaml`:
58+
+
59+
.Example cluster queue YAML file for team B
60+
[source,yaml]
61+
----
62+
apiVersion: kueue.x-k8s.io/v1beta1
63+
kind: ClusterQueue
64+
metadata:
65+
name: team-b-queue
66+
spec:
67+
cohort: engineering-dept # <1>
68+
resourceQuota: # <2>
69+
static:
70+
"cpu": "60" # <3>
71+
"memory": "120Gi" # <4>
72+
resourceGroups: # <5>
73+
- coveredResources: ["cpu"]
74+
flavors:
75+
- name: "high-perf-cpu"
76+
resources:
77+
- name: "cpu"
78+
nominalQuota: 15
79+
- coveredResources: ["memory"]
80+
flavors:
81+
- name: "high-perf-memory"
82+
resources:
83+
- name: "memory"
84+
nominalQuota: "30Gi" # <6>
85+
----
86+
<1> Associates the cluster queue with the cohort.
87+
<2> Defines the static resource quota for team B.
88+
<3> Allocates 60 CPU cores for the queue.
89+
<4> Allocates 120 GiB of memory for the queue.
90+
<5> Defines resource flavors and nominal quotas.
91+
<6> Sets the initial quota to signal intent to share borrowable resources, even if sharing is not currently needed.
92+
93+
. Apply the cluster queue by running the following command:
94+
+
95+
[source,terminal]
96+
----
97+
$ oc apply -f clusterqueue-team-b.yaml
98+
----
99+
100+
.Verification

tutorials/_attributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../_attributes/

tutorials/images

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../images/

tutorials/modules

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../modules/

tutorials/snippets

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../snippets/
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
include::_attributes/common-attributes.adoc[]
3+
[id="tutorials-team-resources"]
4+
= Using cohorts to enable team resource sharing in a cluster
5+
:context: tutorials-team-resources
6+
7+
toc::[]
8+
9+
{product-title} cohorts can enable teams in enterprise environments to share resources, while still maintaining individual control over their own team's workloads.
10+
11+
The following procedures demonstrate how a company with two teams in their engineering department could share resources with the use of cohorts.
12+
13+
[id="prerequisites_{context}"]
14+
== Prerequisites
15+
16+
* You have cluster administrator permissions on an {platform} cluster, where the {product-title} Operator is installed and configured.
17+
* You have installed {oc-first}.
18+
19+
include::modules/tutorial-creating-a-cohort.adoc[leveloffset=+1]
20+
21+
include::modules/tutorial-define-clusterqueues.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)