Skip to content

Commit

Permalink
Added Queue design doc.
Browse files Browse the repository at this point in the history
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
  • Loading branch information
k82cn committed Apr 19, 2019
1 parent dba21aa commit 8fc3cca
Showing 1 changed file with 113 additions and 0 deletions.
113 changes: 113 additions & 0 deletions docs/design/queue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Queue

[@k82cn](http://github.com/k82cn); April 17, 2019

## Motivation

`Queue` was introduced in [kube-batch](http://github.com/kubernetes-sigs/kube-batch) long time ago as an internal feature, which makes all jobs are submitted to the same queue, named `default`. As more and more users would like to share resources with each other by queue, this proposal is going to cover primary features of queue achieve that.

## Function Specification

The queue is cluster level, so the user from different namespaces can share resource within a `Queue`. The following section defines the api of queue.

### API

```go
type Queue struct {
metav1.TypeMeta `json:",inline"`

metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

// Specification of the desired behavior of a queue
// +optional
Spec QueueSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`

// Current status of Queue
// +optional
Status QueueStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
}

type QueueSpec struct {
// The weight of queue to share the resources with each other.
Weight int32 `json:"weight,omitempty" protobuf:"bytes,1,opt,name=weight"`
}

type QueueStatus struct {
// The number of job in Unknown status
Unknown int32 `json:"running,omitempty" protobuf:"bytes,1,opt,name=running"`
// The number of job in Running status
Running int32 `json:"running,omitempty" protobuf:"bytes,2,opt,name=running"`
// The number of job in Pending status
Pending int32 `json:"pending,omitempty" protobuf:"bytes,3,opt,name=pending"`
// The number of job in Completed status
Completed int32 `json:"completed,omitempty" protobuf:"bytes,4,opt,name=completed"`
// The number of job in Failed status
Failed int32 `json:"failed,omitempty" protobuf:"bytes,5,opt,name=failed"`
// The number of job in Aborted status
Aborted int32 `json:"aborted,omitempty" protobuf:"bytes,6,opt,name=aborted"`
}
```

### QueueController

The `QueueController` will manage the lifecycle of queue:

1. Watching `PodGroup`/`Job` for status
2. If `Queue` was deleted, also delete all related `PodGroup`/`Job` in the queue

### Admission Controller

The admission controller will check `PodGroup`/`Job` 's queue when creation:

1. if the queue does not exist, the creation will be rejected
2. if the queue is releasing, the creation will be also rejected

### Feature Interaction

#### Customized Job/PodGroup

If the `PodGroup` is created by customized controller, the `QueueController` will count those `PodGroup` into `Unknown` status; because `PodGroup` focus on scheduling specification which did not include customized job's status.

#### cli

Command line is also enhanced for operater engineers. Three sub-commands are introduced as follow:

__create__:

`create` command is used to create a queue with weight; for example, the following command will create a queue named `myqueue` with weight 10.

```shell
$ vkctl queue create --name myqueue --weight 10
```

__view__:

`view` command is used to show the detail of a queue, e.g. creation time; the following command will show the detail of queue `myqueue`

```shell
$ vkctl queue view myqueue
```

__list__:

`list` command is used to show all available queues to current user

```shell
$ vkctl queue list
Name Weight Total Pending Running ...
myqueue 10 10 5 5
```

#### Scheduler

* Proportion plugin:

Proportion plugin is used to share resource between `Queue`s by weight. The deserved resource of a queue is `(weight/total-weight) * total-resource`. When allocating resources, it will not allocate resource more than its deserved resources.

* Reclaim action:

`reclaim` action will go through all queues to reclaim others by `ReclaimableFn`'s return value; the time complexity is `O(n^2)`. In `ReclaimableFn`, both `proportion` and `gang` will take effect: 1. `proportion` makes sure the queue will not be under-used after reclaim, 2. `gang` makes sure the job will not be reclaimed if its `minAvailable` > 1.

* Backfill action:

When `allocate` action assign resources to each queue, there's a case that ([kube-batch#492](<https://github.com/kubernetes-sigs/kube-batch/issues/492>)) the resources maybe unnecessary idle because of `proportion` plugin: there are one pending job in two queue each, and the deserved resources of each queue can not meet the requirement of their jobs. In such case, `backfill` action will ignore deserved guarantee of queue to fill idle resources as much as possible. This introduces another potential case that the coming smaller job is blocked; this case will be handle be reserved resources of each queue in other project.

0 comments on commit 8fc3cca

Please sign in to comment.