Skip to content

Commit 05bd444

Browse files
authored
Merge pull request #14 from pohly/dra-evolution
DRA API evolution
2 parents e99a701 + 72c1f7c commit 05bd444

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+2577
-1261
lines changed

dra-evolution/README.md

Lines changed: 44 additions & 132 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
1-
# k8srm-prototype
1+
# dra-evolution
22

3-
For more background, please see this document, though it is not yet up to date
4-
with the latest in this repo:
5-
- [Revisiting Kubernetes Resource
6-
Model](https://docs.google.com/document/d/1Xy8HpGATxgA2S5tuFWNtaarw5KT8D2mj1F4AP1wg6dM/edit?usp=sharing).
3+
The [k8srm-prototype](../k8srm-prototype/README.md) is an attempt to derive a
4+
new API for device management from scratch. The API in this directory is taking
5+
the opposite approach: it incorporates ideas from the prototype into the 1.30
6+
DRA API. For some problems it picks a different approach.
7+
To compare YAML files, something like this can be used:
8+
```
9+
diff -C2 ../k8srm-prototype/testdata/classes.yaml <(sed -e 's;resource.k8s.io/v1alpha2;devmgmtproto.k8s.io/v1alpha1;' -e 's/ResourceClass/DeviceClass/' testdata/classes.yaml)
10+
```
711

812
## Overall Model
913

@@ -34,124 +38,28 @@ projects.
3438
## Open Questions
3539

3640
The next few sections of this document describe a proposed model. Note that this
37-
is really a brainstorming exercise and under active development. See the [open
38-
questions](open-questions.md) document for some of the still under discussion
39-
items.
40-
41-
We are also looking at how we might extend the existing 1.30 DRA model with some
42-
of these ideas, rather than changing it out for these specific types.
41+
is really a brainstorming exercise and under active development.
4342

4443
## Pod Spec
4544

4645
This prototype changes the `PodSpec` a little from how it is in DRA in 1.30.
4746

48-
In 1.30, the `PodSpec` has a list of named sources. The sources are structs that
47+
As 1.30, the `PodSpec` has a list of named sources. The sources are structs that
4948
could contain either a claim name or a template name. The names are used to
50-
associate individual claims with containers. The example below allocates a
51-
single "foozer" device to the container in the pod.
52-
53-
```yaml
54-
apiVersion: resource.k8s.io/v1alpha1
55-
kind: ResourceClaimTemplate
56-
metadata:
57-
name: foozer
58-
namespace: default
59-
spec:
60-
spec:
61-
resourceClassName: example.com-foozer
62-
---
63-
apiVersion: v1
64-
kind: Pod
65-
metadata:
66-
name: foozer
67-
namespace: default
68-
spec:
69-
containers:
70-
- image: registry.k8s.io/pause:3.6
71-
name: my-container
72-
resources:
73-
requests:
74-
cpu: 10m
75-
memory: 10Mi
76-
claims:
77-
- name: gpu
78-
resourceClaims:
79-
- name: gpu
80-
source:
81-
resourceClaimTemplate: foozer
82-
```
49+
associate individual claims with containers.
8350

84-
In the prototype model, we are adding `matchAttributes` constraints to control
85-
consistency within a selection of devices. In particular, we want to be able to
86-
specify a `matchAttributes` constraint across two separate named sources, so
87-
that we can ensure for example, a GPU chosen for one container is the same model
88-
as one chosen for another container. This would imply we need `matchAttributes`
89-
that apply across the list present in `PodSpec`. However, we don't want to put
90-
things like `matchAttributes` into `PodSpec`, since it is already `v1`.
91-
92-
So, we tweak the `PodSpec` a bit from 1.30, such that, instead of a list of
93-
named sources, with each source being a oneOf, we instead have a single
94-
`DeviceClaims` oneOf in the `PodSpec`. This oneOf could be:
95-
- A list of named sources, where sources are limited to a simple "class" name
96-
(ie, not a list of oneOfs, just a list of simple structs).
97-
- A template struct, which consists of ObjectMeta + a claim name.
98-
- A claim name.
99-
100-
Additionally we move the container association from
101-
`spec.containers[*].resources.claims` to `spec.containers[*].devices`.
102-
103-
The first form of the of the `DeviceClaims` oneOf allows for our simplest of use
104-
cases to be very simple to express, without creating a secondary object to which
105-
we must then refer. So, the equivalent of the 1.30 YAML above would be:
106-
107-
```yaml
108-
apiVersion: v1
109-
kind: Pod
110-
metadata:
111-
name: foozer
112-
namespace: default
113-
spec:
114-
containers:
115-
- image: registry.k8s.io/pause:3.6
116-
name: my-container
117-
resources:
118-
requests:
119-
cpu: 10m
120-
memory: 10Mi
121-
devices:
122-
- name: gpu
123-
deviceClaims:
124-
devices:
125-
- name: gpu
126-
class: example.com-foozer
127-
```
51+
Each claim may contain multiple request for different devices. Containers can
52+
also be associated with individual requests inside a claim.
53+
54+
Allocating multiple devices per claim allows specifying constraints for a set
55+
of devices, like "some attribute has to be the same". Long-term, it would be
56+
good to allow such constraints also across claims when a pod references more
57+
than one, but that would imply extending the `PodSpec` with complex fields
58+
where we are not sure yet what they need to look like. Therefore these
59+
constraints are currently limited to claims. This limitation may be
60+
removed once constraints are stable enough to be included in the `PodSpec`.
12861

129-
Each entry in `spec.deviceClaims.devices` is just a name/class pair, but in fact
130-
serves as a template to generate claims that exist with the lifecycle of the
131-
pod. We may want to add `ObjectMeta` here as well, since it is behaving as a
132-
template, to allow setting labels, etc.
133-
134-
The second form of `DeviceClaims` is a single struct with an ObjectMeta, and a
135-
claim name. The key with this form is that it is not *list* of named objects.
136-
Instead, it is a reference to a single claim object, and the named entries are
137-
*inside* the referenced object. This is to avoid a two-key mount in the
138-
`spec.containers[*].devices` entry. If that's not important, then we can tweak
139-
this a bit. In any case, this form allows claims which follow the lifecycle of
140-
the pod, similar to the first form. Since a top-level API claim spec can can
141-
contain multiple claim instances, this should be equally as expressive as if we
142-
included `matchAttributes` in the `PodSpec`, without having to do so.
143-
144-
The third form of `DeviceClaims` is just a string; it is a claim name and allows
145-
the user to share a pre-provisioned claim between pods.
146-
147-
Given that the first and second forms both have a template-like structure, we
148-
may want to combine them and use two-key indexing in the mounts. If we do so, we
149-
still want the direct specification of the class, so that the most common case
150-
does not need separate object just to reference a class.
151-
152-
These `PodSpec` Go types can be seen in [podspec.go](testdata/podspec.go). This
153-
is not the complete `PodSpec` but just the relevant parts of the 1.30 and
154-
proposed versions.
62+
These `PodSpec` Go types can be seen in [pod_types.go](pkg/api/pod_types.go).
15563

15664
## Types
15765

@@ -162,21 +70,25 @@ claim types.
16270

16371
Claim and allocation types are found in [claim_types.go](pkg/api/claim_types.go);
16472
individual types and fields are described in detail there in the comments.
73+
Capacity types are in [capacity_types.go](pkg/api/capacity_types.go). A quota
74+
mechanism is defined in [quota_types.go](pkg/api/quota_types.go).
16575

16676
Vendors and administrators create `DeviceClass` resources to pre-configure
167-
various options for claims. DeviceClass resources come in two varieties:
168-
- Ordinary or "leaf" classes that represent devices managed by a specific
169-
driver, along with some optional selection constraints and configuration.
170-
- "Meta" or "Group" or "Aggregate" or "Composition" classes that use a label
171-
selector to identify a *set* of leaf classes. This allows a claim to be
172-
satistfied by one of many classes.
77+
various options for requests in claims. Such a class contains:
78+
- configuration for a device, potentially including options that
79+
only an administrator may set
80+
- device requirements which select device instances that match the intended
81+
semantic of the class ("give me a GPU")
82+
83+
Classes are not necessarily associated with a single vendor. Whether they are
84+
depends on how the requirements in them are defined.
17385

17486
Example classes are in [classes.yaml](testdata/classes.yaml).
17587

17688
Example pod definitions can be found in the `pod-*.yaml` and `two-pods-*.yaml`
17789
files in [testdata](testdata).
17890

179-
Drivers publish capacity via `DevicePool` resources. Examples may be found in
91+
Drivers publish capacity via `ResourcePool` objects. Examples may be found in
18092
the `pools-*.yaml` files in [testdata](testdata).
18193

18294
## Building
@@ -188,14 +100,14 @@ capacity data.
188100
Just run `make`, it will build everything.
189101

190102
```console
191-
k8srm-prototype$ make
103+
dra-evolution$ make
192104
gofmt -s -w .
193105
go test ./...
194-
? github.com/kubernetes-sigs/wg-device-management/k8srm-prototype/cmd/mock-apiserver [no test files]
195-
? github.com/kubernetes-sigs/wg-device-management/k8srm-prototype/cmd/schedule [no test files]
196-
? github.com/kubernetes-sigs/wg-device-management/k8srm-prototype/pkg/api [no test files]
197-
? github.com/kubernetes-sigs/wg-device-management/k8srm-prototype/pkg/gen [no test files]
198-
ok github.com/kubernetes-sigs/wg-device-management/k8srm-prototype/pkg/schedule (cached)
106+
? github.com/kubernetes-sigs/wg-device-management/dra-evolution/cmd/mock-apiserver [no test files]
107+
? github.com/kubernetes-sigs/wg-device-management/dra-evolution/cmd/schedule [no test files]
108+
? github.com/kubernetes-sigs/wg-device-management/dra-evolution/pkg/api [no test files]
109+
? github.com/kubernetes-sigs/wg-device-management/dra-evolution/pkg/gen [no test files]
110+
ok github.com/kubernetes-sigs/wg-device-management/dra-evolution/pkg/schedule (cached)
199111
cd cmd/schedule && go build
200112
cd cmd/mock-apiserver && go build
201113
```
@@ -207,7 +119,7 @@ and used to try out scheduling (WIP). It will spit out some errors but you can
207119
ignore them.
208120

209121
```console
210-
k8srm-prototype$ ./cmd/mock-apiserver/mock-apiserver
122+
dra-evolution$ ./cmd/mock-apiserver/mock-apiserver
211123
W0422 13:20:21.238440 2062725 memorystorage.go:93] type info not known for apiextensions.k8s.io/v1, Kind=CustomResourceDefinition
212124
W0422 13:20:21.238598 2062725 memorystorage.go:93] type info not known for apiregistration.k8s.io/v1, Kind=APIService
213125
W0422 13:20:21.238639 2062725 memorystorage.go:267] type info not known for foozer.example.com/v1alpha1, Kind=FoozerConfig
@@ -222,18 +134,18 @@ W0422 13:20:21.238723 2062725 memorystorage.go:267] type info not known for devm
222134
The included `kubeconfig` will access that server. For example:
223135

224136
```console
225-
k8srm-prototype$ kubectl --kubeconfig kubeconfig apply -f testdata/drivers.yaml
137+
dra-evolution$ kubectl --kubeconfig kubeconfig apply -f testdata/drivers.yaml
226138
devicedriver.devmgmtproto.k8s.io/example.com-foozer created
227139
devicedriver.devmgmtproto.k8s.io/example.com-barzer created
228140
devicedriver.devmgmtproto.k8s.io/sriov-nic created
229141
devicedriver.devmgmtproto.k8s.io/vlan created
230-
k8srm-prototype$ kubectl --kubeconfig kubeconfig get devicedrivers
142+
dra-evolution$ kubectl --kubeconfig kubeconfig get devicedrivers
231143
NAME AGE
232144
example.com-foozer 2y112d
233145
example.com-barzer 2y112d
234146
sriov-nic 2y112d
235147
vlan 2y112d
236-
k8srm-prototype$
148+
dra-evolution$
237149
```
238150

239151
## `schedule` CLI

dra-evolution/cmd/gen/main.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ import (
55
"fmt"
66
"os"
77

8-
"github.com/kubernetes-sigs/wg-device-management/k8srm-prototype/pkg/gen"
8+
"github.com/kubernetes-sigs/wg-device-management/dra-evolution/pkg/gen"
99

1010
"sigs.k8s.io/yaml"
1111
)

dra-evolution/cmd/mock-apiserver/main.go

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
package main
22

33
import (
4+
"log"
5+
"sync"
6+
47
"k8s.io/apimachinery/pkg/api/meta"
58
"k8s.io/apimachinery/pkg/runtime/schema"
6-
"log"
79
"sigs.k8s.io/kubebuilder-declarative-pattern/mockkubeapiserver"
8-
"sync"
910
)
1011

1112
func main() {
@@ -19,14 +20,13 @@ func main() {
1920
k8s.RegisterType(schema.GroupVersionKind{Group: "", Version: "v1", Kind: "Namespace"}, "namespaces", meta.RESTScopeRoot)
2021
k8s.RegisterType(schema.GroupVersionKind{Group: "", Version: "v1", Kind: "Secret"}, "secrets", meta.RESTScopeNamespace)
2122
k8s.RegisterType(schema.GroupVersionKind{Group: "", Version: "v1", Kind: "ConfigMap"}, "configmaps", meta.RESTScopeNamespace)
22-
k8s.RegisterType(schema.GroupVersionKind{Group: "", Version: "v1", Kind: "Pod"}, "pods", meta.RESTScopeNamespace)
23+
k8s.RegisterType(schema.GroupVersionKind{Group: "resource.k8s.io", Version: "v1alpha2", Kind: "Pod"}, "pods", meta.RESTScopeNamespace)
2324
k8s.RegisterType(schema.GroupVersionKind{Group: "", Version: "v1", Kind: "Node"}, "nodes", meta.RESTScopeNamespace)
2425
k8s.RegisterType(schema.GroupVersionKind{Group: "foozer.example.com", Version: "v1alpha1", Kind: "FoozerConfig"}, "foozerconfigs", meta.RESTScopeNamespace)
25-
k8s.RegisterType(schema.GroupVersionKind{Group: "devmgmtproto.k8s.io", Version: "v1alpha1", Kind: "DeviceDriver"}, "devicedrivers", meta.RESTScopeRoot)
26-
k8s.RegisterType(schema.GroupVersionKind{Group: "devmgmtproto.k8s.io", Version: "v1alpha1", Kind: "DeviceClass"}, "deviceclasses", meta.RESTScopeRoot)
27-
k8s.RegisterType(schema.GroupVersionKind{Group: "devmgmtproto.k8s.io", Version: "v1alpha1", Kind: "DeviceClaim"}, "deviceclaims", meta.RESTScopeNamespace)
28-
k8s.RegisterType(schema.GroupVersionKind{Group: "devmgmtproto.k8s.io", Version: "v1alpha1", Kind: "DevicePrivilegedClaim"}, "deviceprivilegedclaims", meta.RESTScopeNamespace)
29-
k8s.RegisterType(schema.GroupVersionKind{Group: "devmgmtproto.k8s.io", Version: "v1alpha1", Kind: "DevicePool"}, "devicepools", meta.RESTScopeRoot)
26+
k8s.RegisterType(schema.GroupVersionKind{Group: "resource.k8s.io", Version: "v1alpha2", Kind: "DeviceClass"}, "deviceclasses", meta.RESTScopeRoot)
27+
k8s.RegisterType(schema.GroupVersionKind{Group: "resource.k8s.io", Version: "v1alpha2", Kind: "ResourceClaim"}, "resourceclaims", meta.RESTScopeNamespace)
28+
k8s.RegisterType(schema.GroupVersionKind{Group: "resource.k8s.io", Version: "v1alpha1", Kind: "ResourcePool"}, "resourcepools", meta.RESTScopeRoot)
29+
k8s.RegisterType(schema.GroupVersionKind{Group: "resource.k8s.io", Version: "v1alpha1", Kind: "ResourcePolicy"}, "resourcepolicies", meta.RESTScopeRoot)
3030

3131
wg.Add(1)
3232
addr, err := k8s.StartServing()

dra-evolution/go.mod

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,64 @@
1-
module github.com/kubernetes-sigs/wg-device-management/k8srm-prototype
1+
module github.com/kubernetes-sigs/wg-device-management/dra-evolution
22

33
go 1.22.1
44

55
replace github.com/kubernetes-sigs/wg-device-management/nv-partitionable-resources => ../nv-partitionable-resources
66

77
require (
88
github.com/NVIDIA/go-nvml v0.12.0-5
9-
github.com/google/cel-go v0.20.1
9+
github.com/blang/semver/v4 v4.0.0
10+
github.com/google/cel-go v0.17.8
1011
github.com/kubernetes-sigs/wg-device-management/nv-partitionable-resources v0.0.0-00010101000000-000000000000
1112
github.com/stretchr/testify v1.9.0
1213
k8s.io/api v0.30.0
1314
k8s.io/apimachinery v0.30.0
15+
k8s.io/apiserver v0.30.0
16+
k8s.io/klog/v2 v2.120.1
17+
k8s.io/utils v0.0.0-20240423183400-0849a56e8f22
1418
sigs.k8s.io/kubebuilder-declarative-pattern/mockkubeapiserver v0.0.0-20240404191132-83bd9c05741b
1519
sigs.k8s.io/yaml v1.4.0
1620
)
1721

1822
require (
1923
github.com/Masterminds/semver v1.5.0 // indirect
2024
github.com/NVIDIA/go-nvlib v0.3.0 // indirect
21-
github.com/antlr4-go/antlr/v4 v4.13.0 // indirect
25+
github.com/antlr/antlr4/runtime/Go/antlr/v4 v4.0.0-20230305170008-8188dc5388df // indirect
2226
github.com/davecgh/go-spew v1.1.1 // indirect
27+
github.com/emicklei/go-restful/v3 v3.11.0 // indirect
2328
github.com/go-logr/logr v1.4.1 // indirect
29+
github.com/go-openapi/jsonpointer v0.19.6 // indirect
30+
github.com/go-openapi/jsonreference v0.20.2 // indirect
31+
github.com/go-openapi/swag v0.22.3 // indirect
2432
github.com/gogo/protobuf v1.3.2 // indirect
33+
github.com/golang/protobuf v1.5.4 // indirect
34+
github.com/google/gnostic-models v0.6.8 // indirect
2535
github.com/google/gofuzz v1.2.0 // indirect
2636
github.com/google/uuid v1.6.0 // indirect
37+
github.com/josharian/intern v1.0.0 // indirect
2738
github.com/json-iterator/go v1.1.12 // indirect
39+
github.com/mailru/easyjson v0.7.7 // indirect
2840
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
2941
github.com/modern-go/reflect2 v1.0.2 // indirect
42+
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
3043
github.com/pmezard/go-difflib v1.0.0 // indirect
3144
github.com/stoewer/go-strcase v1.2.0 // indirect
3245
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa // indirect
3346
golang.org/x/net v0.23.0 // indirect
47+
golang.org/x/oauth2 v0.10.0 // indirect
48+
golang.org/x/sync v0.6.0 // indirect
49+
golang.org/x/sys v0.18.0 // indirect
50+
golang.org/x/term v0.18.0 // indirect
3451
golang.org/x/text v0.14.0 // indirect
35-
google.golang.org/genproto/googleapis/api v0.0.0-20230803162519-f966b187b2e5 // indirect
36-
google.golang.org/genproto/googleapis/rpc v0.0.0-20230803162519-f966b187b2e5 // indirect
52+
golang.org/x/time v0.3.0 // indirect
53+
google.golang.org/appengine v1.6.7 // indirect
54+
google.golang.org/genproto/googleapis/api v0.0.0-20230726155614-23370e0ffb3e // indirect
55+
google.golang.org/genproto/googleapis/rpc v0.0.0-20230822172742-b8732ec3820d // indirect
3756
google.golang.org/protobuf v1.33.0 // indirect
3857
gopkg.in/inf.v0 v0.9.1 // indirect
3958
gopkg.in/yaml.v2 v2.4.0 // indirect
4059
gopkg.in/yaml.v3 v3.0.1 // indirect
41-
k8s.io/klog/v2 v2.120.1 // indirect
42-
k8s.io/utils v0.0.0-20240423183400-0849a56e8f22 // indirect
60+
k8s.io/client-go v0.30.0 // indirect
61+
k8s.io/kube-openapi v0.0.0-20240228011516-70dd3763d340 // indirect
4362
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
4463
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
4564
)

0 commit comments

Comments
 (0)