Skip to content

[HWORKS-2175] Kueue - queues, cohorts and topologies #479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 35 additions & 4 deletions docs/user_guides/projects/scheduling/kube_scheduler.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
---
description: Documentation on how to configure Kubernetes scheduling options for Hopsworks workloads.
---

# Scheduler

## Introduction

Hopsworks allows users to configure [Affinity](https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/) and [Priority Classes](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass) when running workloads on Hopsworks, this includes jobs, jupyter notebooks and model deployments.
Hopsworks allows users to configure some Kubernetes scheduler abstractions, such as [Affinity](https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/) and [Priority Classes](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass). Hopsworks also supports additional scheduling abstractions backed by Kueue. This includes [Queues](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/), [Cohorts](https://kueue.sigs.k8s.io/docs/concepts/cohort/) and [Topologies](https://kueue.sigs.k8s.io/docs/concepts/topology_aware_scheduling/). All these scheduling abstractions are supported in jobs, jupyter notebooks and model deployments. Kueue abstractions however, are currently not supported for Spark jobs.

Hopsworks Admins can control which labels and priority classes can be used the cluster (see [Cluster configuration](#cluster-configuration) section) and by which project (see [Default Project configuration](#default-project-configuration) section)
Hopsworks Admins can control which labels and priority classes can be used the cluster (see [Cluster configuration](#cluster-configuration) section) and by which project (see [Default Project configuration](#default-project-configuration) section)

Within a project, data owners can set defaults for jobs and Jupyter notebooks running within that project (see: [Project defaults](#project-defaults) section).
Within a project, data owners can set defaults for jobs and Jupyter notebooks running within that project (see: [Project defaults](#project-defaults) section).

### Node Labels, Node Affinity and Node Anti-Affinity

Expand Down Expand Up @@ -44,7 +45,31 @@ Common uses:

For more information on Priority Classes, you can check the Kubernetes [Priority Classes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass) page.

## Cluster Configuration
## Kueue

Hopsworks adds the integration with Kueue to offer more advanced scheduling abstractions such as queues, cohorts and topologies.

For a more detailed view on how Hopsworks uses the Kueue abstractions you can check the [Kueue details](./kueue_details.md) section.

### Queues, Cohorts

Jobs, notebooks and model deployments are submitted to these queues. Hopsworks administrator can define quotas on how many resources a queue can use. Queues can be grouped together in cohorts in order to add the ability to borrow resources from each other when the other queue does not use its resources.

When creating a new job, the user can select a queue for the job in the `Advance configuration -> Scheduler section`.

![Default queue for user and system jobs](../../../assets/images/guides/project/scheduler/job_queue.png)

### Topologies

The integration of Hopsworks with Kueue, also provides access to the topology abstraction. Topologies can be defined, so that the user can decide for the pods of jobs or model deployments to run somehow grouped together. The user could decide for example, that all pods of a job should run on the same host, because the pods need to transfer a lot of data between each other, and we want to avoid network traffic to lower the latency.

The user can select the topology unit for jobs, notebooks and model deployments in the `Advance configuration -> Scheduler section`.

![Default queue for user and system jobs](../../../assets/images/guides/project/scheduler/job_topology_unit.png)

## Admin configuration

### Affinity and priority classes

Hopsworks admins can control the affinity labels and priority classes available on the Hopsworks cluster from the `Cluster Settings -> Scheduler` page:

Expand Down Expand Up @@ -72,6 +97,12 @@ Hopsworks Cluster can run within a shared Kubernets Cluster. The first configura

If the roles above are configured properly (default behaviour), admins can only select values from the drop down menu. If the roles are missing, admins would be required to enter them as free text and should be careful about typos. Any typos here will be propagated in the other configuration and use levels leading to errors or missbehaviour when running computation.

### Queues

Every new project gets automatic access to the default Hopsworks queue. An administrator can define the default queue for projects user jobs and system jobs.

![Default queue for user and system jobs](../../../assets/images/guides/project/scheduler/default_queue.png)

## Project Configuration

Hopsworks admins can configure the labels and priority classes that can be used by default within a project. This will be a subset of the ones configured for Hopsworks.
Expand Down
135 changes: 135 additions & 0 deletions docs/user_guides/projects/scheduling/kueue_details.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
description: Kueue abstractions
---

# Kueue

## Introduction

Hopsworks provides the integration with Kueue to provide the aditional scheduling abstractions. Hopsworks currently acts only as a "reader" to the Kueue abstractions and currently does not manage the lifecycle of Kueue abstraction with the exception of the default localqueue for each namespace. All the other abstractions are expected to be managed by the administrators of Hopsworks, directly on the Kubernetes cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional


However Hopsworks and Kueue integration currently only supports frameworks python and ray for jobs, notebooks and model deployments. The same queues are also used for Hopsworks internal jobs (zipping, git operations, python library installation). Spark is currently not supported, and thus will not be managed by Kueue for scheduling, and instead it will bypass the queues setup (important to note when thinking about queue quotas) and instead are managed directly by the Kubernetes Scheduler.

### Resource flavors

When trying to define queues in Kueue, the first abstraction that needs to be defined is a [Resource Flavor](https://kueue.sigs.k8s.io/docs/concepts/resource_flavor/). The resource flavor defines the resources that a queue will later manage. Hopsworks helm chart installs and uses a default ResourceFlavor

```
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: default-flavor
spec:
nodeLabels:
cloud.provider.com/region: europe
topologyName: default
```

Node labels filter the available nodes to this resource flavor and is required for [topologies](#Topologies)

### Cluster Queues

[Cluster Queues](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/) are the actual queues for submitting jobs and model deployments to. The default hopsworks queue looks like:

```
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: other
spec:
cohort: cluster
namespaceSelector: {}
preemption:
borrowWithinCohort:
policy: Never
reclaimWithinCohort: Never
withinClusterQueue: Never
queueingStrategy: BestEffortFIFO
resourceGroups:
- coveredResources:
- cpu
- memory
- pods
- nvidia.com/gpu
flavors:
- name: default-flavor
resources:
- name: cpu
nominalQuota: "0"
- name: memory
nominalQuota: "0"
- name: pods
nominalQuota: "0"
- name: nvidia.com/gpu
nominalQuota: "0"
```

The [preemption](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption) and [nominal quotas](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#flavors-and-resources) are set to the minimal as this queue is designed to have lowest priority in getting resources allocated. If cluster is underutilized and there are resources available, it can still borrow up to the maximum resources present in the parent cohort, but by design this queue has no dedicated resources. The presumption is that other, more important queues, defined by the cluster administrator will have higher preference in getting resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a cluster is underutilized


### Local Queues

[Local Queues](https://kueue.sigs.k8s.io/docs/concepts/local_queue/) are the mechanism to provide access to a queue (cluster queue) to a specific project in Hopsworks (Kubernetes namespace).

Every new project gets automatic access to the default Hopsworks queue. An administrator can define the default queue for projects user jobs and system jobs.

![Default queue for user and system jobs](../../../assets/images/guides/project/scheduler/default_queue.png)

### Cohorts

[Cohorts](https://kueue.sigs.k8s.io/docs/concepts/cohort/) are groupings of cluster queues that have some meaning together and can share resources. Hopsworks defines a default `cluster` cohort

```
apiVersion: kueue.x-k8s.io/v1alpha1
kind: Cohort
metadata:
name: cluster
spec:
resourceGroups:
- coveredResources:
- cpu
- memory
- pods
- nvidia.com/gpu
flavors:
- name: default-flavor
resources:
- name: cpu
nominalQuota: 100
- name: memory
nominalQuota: 200Gi
- name: pods
nominalQuota: 100
- name: nvidia.com/gpu
nominalQuota: 50
```

Cohorts can contain other cohorts and thus you can create a hierarchy of cohorts. Cohorts can set [fair sharing weight](https://kueue.sigs.k8s.io/docs/concepts/admission_fair_sharing/) where using

```
fairSharing:
weight
```

in the definition of a cohort, the user can control a priority towards borowing resources from other cohorts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

borrowing


### Topologies

[Topologies](https://kueue.sigs.k8s.io/docs/concepts/topology_aware_scheduling/) defines a way of grouping together pods belonging to the same job/deployment so that they are colocated wihtin the same topology unit. Hopsworks defines a default topology:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wihtin->within


```
apiVersion: kueue.x-k8s.io/v1alpha1
kind: Topology
metadata:
name: default
spec:
levels:
- nodeLabel: cloud.provider.com/region
- nodeLabel: cloud.provider.com/zone
- nodeLabel: kubernetes.io/hostname
```

The topology is defined in the Resource Flavor used by a Cluster Queue.

When creating a new job, the user can select a topology unit for the job to run in and thus decide if all pods of a job should run on the same hostname, in the same zone or in the same region. The user can select the topology for jobs, notebooks and deployments in the `Advance configuration -> Scheduler section`.

![Default queue for user and system jobs](../../../assets/images/guides/project/scheduler/job_topology_unit.png)