Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRA: Integrates with DRA and CDI #3329

Merged
merged 13 commits into from
Apr 29, 2024
Prev Previous commit
Next Next commit
DRA: add docs of quick-start
  • Loading branch information
cyclinder committed Apr 28, 2024
commit d04254e25271e1182469782bce8dd8f1e35e70d8
6 changes: 6 additions & 0 deletions docs/develop/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,3 +69,9 @@
| | support ipoib CNI for infiniband device | v0.9.0 | | |
| | support ib-sriov CNI for infiniband device | v0.9.0 | | |
| EgressGateway | egressGateway | v0.8.0 | | |
| Dynamic-Resource-Allocation | implement dra framework | v1.0.0 | | |
| | support for SpiderClaimParameter's rdmaAcc feature | v1.0.0 | | |
| | support for schedule pod by SpiderMultusConfig or SpiderIPPool | Todo | | |
| | unify the way device-plugin declares resources | Todo | | |


2 changes: 2 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ nav:
- Node-based Topology: usage/network-topology.md
- RDMA with RoCE: usage/rdma-roce.md
- RDMA with Infiniband: usage/rdma-ib.md
- Dynamic-Resource-Allocation: usage/dra.md
- Multi-Cluster Networking: usage/submariner.md
- Access Service for Underlay CNI: usage/underlay_cni_service.md
- Bandwidth Manage for IPVlan CNI: usage/ipvlan_bandwidth.md
Expand All @@ -113,6 +114,7 @@ nav:
- CRD Spidercoordinator: reference/crd-spidercoordinator.md
- CRD SpiderEndpoint: reference/crd-spiderendpoint.md
- CRD SpiderReservedIP: reference/crd-spiderreservedip.md
- CRD SpiderClaimParameter: reference/crd-spiderclaimparameter.md
- Ifacer plugin: reference/plugin-ifacer.md
- IPAM plugin: reference/plugin-ipam.md
- Development:
Expand Down
33 changes: 33 additions & 0 deletions docs/reference/crd-spiderclaimparameter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# SpiderClaimParameter

A SpiderClaimParameter resource is used to describe the resourceclaim and affects the generated CDI file. this CRD only works when the [dra feature](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) is enabled.

## Sample YAML

```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderClaimParameter
metadata:
name: demo
namespace: default
annotations:
dra.spidernet.io/cdi-version: 0.6.0
spec:
rdmaAcc: false
cyclinder marked this conversation as resolved.
Show resolved Hide resolved
```

## Spidercoordinators definition

### Metadata

| Field | Description | Schema | Validation |
|-----------|---------------------------------------------------|--------|------------|
| name | The name of this Spidercoordinators resource | string | required |

### Spec

This is the Spidercoordinators spec for users to configure.

| Field | Description | Schema | Validation | Values | Default |
|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|------------|-----------------------------------------------|------------------------------|
| rdmaAcc | TODO | bool | optional | true,false | false |
243 changes: 243 additions & 0 deletions docs/usage/dra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
# Dynamic-Resource-Allocation

## Introduce

Dynamic-Resource-Allocation (DRA) is a new feature introduced by Kubernetes that puts resource scheduling in the hands of third-party developers. It provides an API more akin to a storage persistent volume, instead of the countable model (e.g., "nvidia.com/gpu: 2") that device-plugin used to request access to resources, with the main benefit being a more flexible and dynamic allocation of hardware resources, resulting in improved resource utilization. The main benefit is more flexible and dynamic allocation of hardware resources, which improves resource utilization and enhances resource scheduling, enabling Pods to schedule the best nodes. DRA is currently available as an alpha feature in Kubernetes 1.26 (December 2022 release), driven by Nvidia and Intel.
Spiderpool currently integrates with the DRA framework, which allows for the following, but not limited to:

* Enabling RDMA hardware resources.
* Enables the use and scheduling of RDMA hardware resources, mounting key linux so(shared object) files and setting environment variables.
* Automatically scheduling Pods to appropriate nodes based on their subnets and NICs to prevent Pods from failing to start after scheduling to a node.
* Unify the resource declaration of multiple device-plugins.
* Continuously updated, see for details. [RoadMap](../develop/roadmap.md)

## Explanation of nouns

* ResourceClaimTemplate: resourceclaim template for generating resourceclaim resources. One resourceClaimTemplate can generate multiple resourceclaims.
* ResourceClaim: ResourceClaim binds a specific set of node resources for use by the Pod.
* ResourceClass: A ResourceClass represents a resource (e.g., GPU), and a DRA plugin is responsible for driving the resource represented by a ResourceClass.

## Environment Preparation

* Prepare a Kubernetes cluster with a higher version than v1.29.0, and enable the dra feature-gate function of the cluster.
* Have Kubectl, [Helm] (<https://helm.sh/docs/intro/install/>) installed.

## Quick Start

1. Currently DRA is not turned on by default as an alpha feature of Kubernetes. So we need to turn it on manually as follows.

Add the following to the kube-apiserver startup parameters.

```
--feature-gates=DynamicResourceAllocation=true
--runtime-config=resource.k8s.io/v1alpha2=true
```

Add the following to the kube-controller-manager startup parameters.

```
--feature-gates=DynamicResourceAllocation=true
```

Add the following to kube-scheduler's startup parameters:

```
--feature-gates=DynamicResourceAllocation=true
```

2. DRA needs to rely on [CDI] (<https://github.com/cncf-tags/container-device-interface>), so it needs container runtime support. In this article, we take containerd as an example, and we need to enable cdi function manually.

Modify the containerd configuration file to configure CDI.

```
~# vim /etc/containerd/config.toml
...
[plugins. "io.containerd.grpc.v1.cri"]
enable_cdi = true
cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
~# systemctl restart containerd
```

> It is recommended that containerd be older than v1.7.0, as CDI is supported in later versions. The version supported by different runtimes is not the same, please check if it is supported first.

3. Install Spiderpool, taking care to enable CDI.

```
helm repo add spiderpool https://spidernet-io.github.io/spiderpool
helm repo update spiderpool
helm install spiderpool spiderpool/spiderpool --namespace kube-system --set dra.enabled=true \
--set dra.librarypath="/usr/lib/libtest.so"

> Specify the path to the so file via dra.librarypath, which will be mounted to the Pod's container via CDI. Note that this so file needs to exist on the host.

4. Verify the installation

Check that the Spiderpool pod is running correctly, and check for the presence of the resourceclass resource:

```
~# kubectl get po -n kube-system | grep spiderpool
spiderpool-agent-hqt2b 1/1 Running 0 20d
spiderpool-agent-nm9vl 1/1 Running 0 20d
spiderpool-controller-7d7f4f55d4-w2rv5 1/1 Running 0 20d
spiderpool-init 0/1 Completed 0 21d
~# kubectl get resourceclass
NAME DRIVERNAME AGE
netresources.spidernet.io netresources.spidernet.io 20d
```

> netresources.spidernet.io is Spiderpool's resourceclass, and Spiderpool will take care of creating and allocating resourceclaims belonging to this resourceclass.

5. Create SpiderIPPool and SpiderMultusConfig instances.

> Note: This step can be skipped if your cluster already has other CNIs installed or does not require an underlay CNI with Macvlan.

```shell
MACVLAN_MASTER_INTERFACE="eth0"
cat <<EOF | kubectl apply -f -
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata: name: macvlan-config
name: macvlan-conf
namespace: kube-system
metadata: name: macvlan-conf namespace: kube-system
cniType: macvlan
macvlan.
master: ${MACVLAN_MASTER_INTERFACE}
- ${MACVLAN_MASTER_INTERFACE}
EOF
```

> SpiderMultusConfig will automatically create the Multus network-attachment-definetion instance

```shell
cat <<EOF | kubectl apply -f -
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderIPPool
metadata: name: ippool-test
name: ippool-test
spec.
ips.
- "172.18.30.131-172.18.30.140"
subnet: 172.18.0.0/16
gateway: 172.18.0.1
multusName.
- kube-system/macvlan-conf
EOF
``

6. Create resource files such as workloads and resourceClaim.

```
~# export NAME=demo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export NAME=demo
然后一坨 yaml , 这样是不可能执行成功的

apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderClaimParameter
metadata.
name: ${NAME}
metadata: name: ${NAME}
rdmaAcc: true
---ApiVersion: resource.k8s.io/v1alpha2
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaimTemplate
metadata: ${NAME}
name: ${NAME}
spec: ${NAME}
resourceClassName: netresources.k8s.io/valpha2
resourceClassName: netresources.spidernet.io
parametersRef: apiGroup: spiderpool.spidernet.io
apiGroup: spiderpool.spidernet.io
kind: SpiderClaimParameter
name: ${NAME}
---
apiVersion: apps/v1
kind: Deployment
name: ${NAME} --- apiVersion: apps/v1 kind: Deployment
name: ${NAME} --- apiVersion: apps/v1 kind: Deployment
spec: replicas: 2
replicas: 2
selector: ${NAME
matchLabels: app: ${NAME}
app: ${NAME}
template: ${NAME}
metadata: ${NAME}
annotations: ${NAME} template: metadata.
v1.multus-cni.io/default-network: kube-system/macvlan-conf
labels: app: ${NAME}
app: ${NAME}
spec: ${NAME}
name: ctr: ${NAME} labels: app: ${NAME}
- name: ctr
image: nginx
resources: ${NAME}
claims: name: ${NAME}
- name: ${NAME}
resourceClaims: name: ${NAME}
- name: ${NAME}
resourceClaims: name: ${NAME}
resourceClaimTemplateName: ${NAME}
```

> Create a ResourceClaimTemplate, K8s will create its own unique Resourceclaim for each Pod based on this ResourceClaimTemplate. the declaration cycle of the Resourceclaim will be consistent with that of the Pod. The declaration cycle of the Resourceclaim is consistent with that of the Pod.
>
> The SpiderClaimParameter is used to extend the configuration parameters of the ResourceClaim, which will affect the scheduling of the ResourceClaim and the generation of its CDI file. In this example, setting rdmaAcc to true will affect whether or not the configured so file is mounted.
>
> A Pod's container affects the resources required by containerd by declaring the use of claims in Resources. The CDI file corresponding to the claim is translated into an OCI Spec configuration when the container is run, which determines the container's creation.
>
> If the Pod creation fails with "unresolvable CDI devices: xxxx", it is possible that the CDI version supported by the container at runtime is too low, which makes the container unable to parse the cdi file at runtime. Currently, the default CDI version of Spiderpool is the latest one. You can specify a lower version in the SpiderClaimParameter instance via annotation: "dra.spidernet.io/cdi-version", e.g.: dra.spidernet.io/cdi-version: 0.5.0

7. Validation

After creating the Pod, view the generated resource files such as ResourceClaim.

```
~# kubectl get resourceclaim
NAME RESOURCECLASSNAME ALLOCATIONMODE STATE AGE
demo-745fb4c498-72g7g-demo-7d458 netresources.spidernet.io WaitForFirstConsumer allocated,reserved 20d
~# cat /var/run/cdi/k8s.netresources.spidernet.io-claim_1e15705a-62fe-4694-8535-93a5f0ccf996.yaml
---
cdiVersion: 0.6.0
containerEdits: {}
devices: {}
- {} devices: {} containerEdits: {}
env: {} devices: containerEdits: {} devices: containerEdits: {}
- DRA_CLAIM_UID=1e15705a-62fe-4694-8535-93a5f0ccf996
- LD_PRELOAD=libtest.so
mounts.
- containerPath: /usr/lib/libtest.so
hostPath: /usr/lib/libtest.so
options: /usr/lib/libtest.so
- /usr/lib/libtest.so options: ro
- nosuid
- nodev
- nodev
- containerPath: /usr/lib64/libtest.so
hostPath: /usr/lib/libtest.so
options: /usr/lib64/libtest.so
- nosuid
- nosuid
- nodev
- bind
name: 1e15705a-62fe-4694-8535-93a5f0ccf996
kind: k8s.netresources.spidernet.io/claim
```

This shows that the ResourceClaim has been created, and STATE shows allocated and reserverd, indicating that it has been used by the pod. And spiderpool has generated a CDI file for the ResourceClaim, which describes the files and environment variables to be mounted.

Check that the pod is Running and verify that the so file is mounted and the environment variable (LD_PRELOAD) is declared.

```
~# kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-745fb4c498-72g7g 1/1 Running 0 20m
nginx-745fb4c498-s92qr 1/1 Running 0 20m
~# kubectl exec -it nginx-745fb4c498-72g7g sh
~# ls /usr/lib/libtest.so
/usr/lib/libtest.so
~# printenv LD_PRELOAD
libtest.so
```

You can see that the Pod's containers have correctly mounted the so files and environment variables, and your containers are ready to use the so files you have mounted.

## Welcome to try it out

DRA is currently available as an alpha feature of Spiderpool, and we'll be expanding it with more capabilities in the future, so feel free to try it out. Please let us know if you have any further questions or requests.
Loading