-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: generate a file per resourcePool #583
Conversation
d80698f
to
3c06290
Compare
Signed-off-by: Soule BA <souleb@nvidia.com>
I have tested this with the following: apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-net1
annotations:
k8s.v1.cni.cncf.io/resourceName: nvidia.com/rdma_resource1
spec:
config: '{
"type": "sriov",
"cniVersion": "0.3.1",
"name": "sriov-network",
"ipam": {
"type": "host-local",
"subnet": "10.56.217.0/24",
"routes": [{
"dst": "0.0.0.0/0"
}],
"gateway": "10.56.217.1"
}
}'
---
souleb@c-237-115-20-025:~/network-operator$ cat pod-tc1.yaml
apiVersion: v1
kind: Pod
metadata:
name: testpod1
annotations:
k8s.v1.cni.cncf.io/networks: sriov-net1
spec:
containers:
- name: appcntr1
image: centos/tools
imagePullPolicy: IfNotPresent
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 300000; done;" ]
resources:
requests:
nvidia.com/rdma_resource1: '1'
limits:
nvidia.com/rdma_resource1: '1' I get the expected result: Name: testpod1
Annotations:
k8s.v1.cni.cncf.io/networks: sriov-net1
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "cilium",
"interface": "eth0",
"ips": [
"10.0.2.251"
],
"mac": "9e:35:b3:cf:66:5e",
"default": true,
"dns": {}
},{
"name": "default/sriov-net1",
"interface": "net1",
"ips": [
"10.56.217.2"
],
"mac": "ee:55:4d:84:c6:fa",
"dns": {},
"device-info": {
"type": "pci",
"version": "1.1.0",
"pci": {
"pci-address": "0000:b1:00.4",
"rdma-device": "mlx5_15"
}
}
}]
Status: Running One think to note is the created files: <>:/$ sudo ls -l /var/run/cdi
-rw------- 1 root root 3623 Aug 6 15:45 sriov-dp-nvidia.com-net-pci_104c6xxx.yaml
-rw------- 1 root root 1855 Aug 6 15:45 sriov-dp-nvidia.com-net-pci_71627xxx.yaml The resources discovery order is not idempotent, so the filenames change overtime. I don't think this is an issue though. |
pkg/cdi/cdi.go
Outdated
return err | ||
} | ||
|
||
err = cdi.GetRegistry().SpecDB().WriteSpec(&cdiSpec, cdiSpecPrefix+name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use cdiSpecPrefix+ the pool name here ?
e.g : "sriov-dp-mellanox.com-rdma_resource_1"
that way we write the same file then, no need to worry about c.lastGenratedName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with this to assure a unique name. There is no guarantee for the pool name to be unique right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can assume that same as we assume no one else will create device plugin (unix) socket per resoruce. under kubelet dir (/var/lib/kubelet/plugins_registry/).
moreover for debuggability it would be better to understand which file belongs to which resource.
Pull Request Test Coverage Report for Build 10510563475Details
💛 - Coveralls |
Signed-off-by: Soule BA <souleb@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thx @souleb appreciate your patience :)
@Eoghan1232 mind taking a look on this one ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a small change and a general question maybe I miss something here
@@ -48,11 +48,6 @@ func New() CDI { | |||
|
|||
// CreateCDISpecForPool creates CDI spec file with specified devices | |||
func (c *impl) CreateCDISpecForPool(resourcePrefix string, rPool types.ResourcePool) error { | |||
err := c.CleanupSpecs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question here: are we still going to cover a case where we remove a specific resource and create a new one using the same devices?
I don't see a cleanup removing resources that we don't expose anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all CDI files prefixed with "sriov-dp" are deleted on device plugin start. (existing logic)
Effectively this means that we rely on `writeSpec` to remove irrelevant cdiSpec files. Signed-off-by: Soule BA <souleb@nvidia.com>
merging this one ! @souleb appreciate the patience |
ab6e489
into
k8snetworkplumbingwg:master
fixes #576
This PR if implemented creates a
cdiSpec
file for eachresourcePool
. The file is removed first for each call toCreateCDISpecForPool
. We calculate the digest of thecdiSpec
to get a unique filename based on theresourcePool
devices.The manager itself makes sure to clean the directory in
initServers()
by callingcleanupCDISpecs()
which will delete everything that matchessriov-dp-*