Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add deployment artifacts for iscsi csi driver deployment #45

Closed
wants to merge 2 commits into from

Conversation

humblec
Copy link
Contributor

@humblec humblec commented Apr 23, 2021

Fix: #4

Marking this PR in WIP as the container image should be pushed in a repo and more testing...etc has to be done here

Signed-off-by: Humble Chirammal hchiramm@redhat.com
-->


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Apr 23, 2021
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 23, 2021
@humblec humblec force-pushed the deploy branch 2 times, most recently from 6338a29 to a081b5b Compare April 23, 2021 11:01
pohly added a commit to pohly/csi-driver-iscsi that referenced this pull request May 5, 2021
a0f195c Merge pull request kubernetes-csi#106 from msau42/fix-canary
7100c12 Only set staging registry when running canary job
b3c65f9 Merge pull request kubernetes-csi#99 from msau42/add-release-process
e53f3e8 Merge pull request kubernetes-csi#103 from msau42/fix-canary
d129462 Document new method for adding CI jobs are new K8s versions
e73c2ce Use staging registry for canary tests
2c09846 Add cleanup instructions to release-notes generation
60e1cd3 Merge pull request kubernetes-csi#98 from pohly/kubernetes-1-19-fixes
0979c09 prow.sh: fix E2E suite for Kubernetes >= 1.18
3b4a2f1 prow.sh: fix installing Go for Kubernetes 1.19.0
1fbb636 Merge pull request kubernetes-csi#97 from pohly/go-1.15
82d108a switch to Go 1.15
d8a2530 Merge pull request kubernetes-csi#95 from msau42/add-release-process
843bddc Add steps on promoting release images
0345a83 Merge pull request kubernetes-csi#94 from linux-on-ibm-z/bump-timeout
1fdf2d5 cloud build: bump timeout in Prow job
41ec6d1 Merge pull request kubernetes-csi#93 from animeshk08/patch-1
5a54e67 filter-junit: Fix gofmt error
0676fcb Merge pull request kubernetes-csi#92 from animeshk08/patch-1
36ea4ff filter-junit: Fix golint error
f5a4203 Merge pull request kubernetes-csi#91 from cyb70289/arm64
43e50d6 prow.sh: enable building arm64 image
0d5bd84 Merge pull request kubernetes-csi#90 from pohly/k8s-staging-sig-storage
3df86b7 cloud build: k8s-staging-sig-storage
c5fd961 Merge pull request kubernetes-csi#89 from pohly/cloud-build-binfmt
db0c2a7 cloud build: initialize support for running commands in Dockerfile
be902f4 Merge pull request kubernetes-csi#88 from pohly/multiarch-windows-fix
340e082 build.make: optional inclusion of Windows in multiarch images
5231f05 build.make: properly declare push-multiarch
4569f27 build.make: fix push-multiarch ambiguity
17dde9e Merge pull request kubernetes-csi#87 from pohly/cloud-build
bd41690 cloud build: initial set of shared files
9084fec Merge pull request kubernetes-csi#81 from msau42/add-release-process
6f2322e Update patch release notes generation command
0fcc3b1 Merge pull request kubernetes-csi#78 from ggriffiths/fix_csi_snapshotter_rbac_version_set
d8c76fe Support local snapshot RBAC for pull jobs
c1bdf5b Merge pull request kubernetes-csi#80 from msau42/add-release-process
ea1f94a update release tools instructions
152396e Merge pull request kubernetes-csi#77 from ggriffiths/snapshotter201_update
7edc146 Update snapshotter to version 2.0.1
4cf843f Merge pull request kubernetes-csi#76 from pohly/build-targets
3863a0f build for multiple platforms only in CI, add s390x
8322a7d Merge pull request kubernetes-csi#72 from pohly/hostpath-update
7c5a89c prow.sh: use 1.3.0 hostpath driver for testing
b8587b2 Merge pull request kubernetes-csi#71 from wozniakjan/test-vet
fdb3218 Change 'make test-vet' to call 'go vet'
d717c8c Merge pull request kubernetes-csi#69 from pohly/test-driver-config
a1432bc Merge pull request kubernetes-csi#70 from pohly/kubelet-feature-gates
5f74333 prow.sh: also configure feature gates for kubelet
84f78b1 prow.sh: generic driver installation
3c34b4f Merge pull request kubernetes-csi#67 from windayski/fix-link
fa90abd fix incorrect link
ff3cc3f Merge pull request kubernetes-csi#54 from msau42/add-release-process
ac8a021 Document the process for releasing a new sidecar
23be652 Merge pull request kubernetes-csi#65 from msau42/update-hostpath
6582f2f Update hostpath driver version to get fix for connection-timeout
4cc9174 Merge pull request kubernetes-csi#64 from ggriffiths/snapshotter_2_version_update
8191eab Update snapshotter to version v2.0.0
3c463fb Merge pull request kubernetes-csi#61 from msau42/enable-snapshots
8b0316c Fix overriding of junit results by using unique names for each e2e run
5f444b8 Merge pull request kubernetes-csi#60 from saad-ali/updateHostpathVersion
af9549b Update prow hostpath driver version to 1.3.0-rc2
f6c74b3 Merge pull request kubernetes-csi#57 from ggriffiths/version_gt_kubernetes_fix
fc80975 Fix version_gt to work with kubernetes prefix
9f1f3dd Merge pull request kubernetes-csi#56 from msau42/enable-snapshots
b98b2ae Enable snapshot tests in 1.17 to be run in non-alpha jobs.
9ace020 Merge pull request kubernetes-csi#52 from msau42/update-readme
540599b Merge pull request kubernetes-csi#53 from msau42/fix-make
a4e6299 fix syntax for ppc64le build
771ca6f Merge pull request kubernetes-csi#49 from ggriffiths/prowsh_improve_version_gt
d7c69d2 Merge pull request kubernetes-csi#51 from msau42/enable-multinode
4ad6949 Improve snapshot pod running checks and improve version_gt
53888ae Improve README by adding an explicit Kubernetes dependency section
9a7a685 Create a kind cluster with two worker nodes so that the topology feature can be tested. Test cases that test accessing volumes from multiple nodes need to be skipped
4ff2f5f Merge pull request kubernetes-csi#50 from darkowlzz/kind-0.6.0
80bba1f Use kind v0.6.0
6d674a7 Merge pull request kubernetes-csi#47 from Pensu/multi-arch
8adde49 Merge pull request kubernetes-csi#45 from ggriffiths/snapshot_beta_crds
003c14b Add snapshotter CRDs after cluster setup
a41f386 Merge pull request kubernetes-csi#46 from mucahitkurt/kind-cluster-cleanup
1eaaaa1 Delete kind cluster after tests run.
83a4ef1 Adding build for ppc64le
4fcafec Merge pull request kubernetes-csi#43 from pohly/system-pod-logging
f41c135 prow.sh: also log output of system containers
ee22a9c Merge pull request kubernetes-csi#42 from pohly/use-vendor-dir
8067845 travis.yml: also use vendor directory
23df4ae prow.sh: use vendor directory if available
a53bd4c Merge pull request kubernetes-csi#41 from pohly/go-version
c8a1c4a better handling of Go version
5e773d2 update CI to use Go 1.13.3
f419d74 Merge pull request kubernetes-csi#40 from msau42/add-1.16
e0fde8c Add new variables for 1.16 and remove 1.13
adf00fe Merge pull request kubernetes-csi#36 from msau42/full-clone
f1697d2 Do full git clones in travis. Shallow clones are causing test-subtree errors when the depth is exactly 50.
2c81919 Merge pull request kubernetes-csi#34 from pohly/go-mod-tidy
518d6af Merge pull request kubernetes-csi#35 from ddebroy/winbld2
2d6b3ce Build Windows only for amd64
c1078a6 go-get-kubernetes.sh: automate Kubernetes dependency handling
194289a update Go mod support
0affdf9 Merge pull request kubernetes-csi#33 from gnufied/enable-hostpath-expansion
6208f6a Enable hostpath expansion
6ecaa76 Merge pull request kubernetes-csi#30 from msau42/fix-windows
ea2f1b5 build windows binaries with .exe suffix
2d33550 Merge pull request kubernetes-csi#29 from mucahitkurt/create-2-node-kind-cluster
a8ea8bc create 2-node kind cluster since topology support is added to hostpath driver
df8530d Merge pull request kubernetes-csi#27 from pohly/dep-vendor-check
35ceaed prow.sh: install dep if needed
f85ab5a Merge pull request kubernetes-csi#26 from ddebroy/windows1
9fba09b Add rule for building Windows binaries
0400867 Merge pull request kubernetes-csi#25 from msau42/fix-master-jobs
dc0a5d8 Update kind to v0.5.0
aa85b82 Merge pull request kubernetes-csi#23 from msau42/fix-master-jobs
f46191d Kubernetes master changed the way that releases are tagged, which needed changes to kind. There are 3 changes made to prow.sh:
1cac3af Merge pull request kubernetes-csi#22 from msau42/add-1.15-jobs
0c0dc30 prow.sh: tag master images with a large version number
f4f73ce Merge pull request kubernetes-csi#21 from msau42/add-1.15-jobs
4e31f07 Change default hostpath driver name to hostpath.csi.k8s.io
4b6fa4a Update hostpath version for sidecar testing to v1.2.0-rc2
ecc7918 Update kind to v0.4.0. This requires overriding Kubernetes versions with specific patch versions that kind 0.4.0 supports. Also, feature gate setting is only supported on 1.15+ due to kind.sigs.k8s.io/v1alpha3 and kubeadm.k8s.io/v1beta2 dependencies.
a6f21d4 Add variables for 1.15
db8abb6 Merge pull request kubernetes-csi#20 from pohly/test-driver-config
b2f4e05 prow.sh: flexible test driver config
0399988 Merge pull request kubernetes-csi#19 from pohly/go-mod-vendor
066143d build.make: allow repos to use 'go mod' for vendoring
0bee749 Merge pull request kubernetes-csi#18 from pohly/go-version
e157b6b update to Go 1.12.4
88dc9a4 Merge pull request kubernetes-csi#17 from pohly/prow
0fafc66 prow.sh: skip sanity testing if component doesn't support it
bcac1c1 Merge pull request kubernetes-csi#16 from pohly/prow
0b10f6a prow.sh: update csi-driver-host-path
0c2677e Merge pull request kubernetes-csi#15 from pengzhisun/master
ff9bce4 Replace 'return' to 'exit' to fix shellcheck error
c60f382 Merge pull request kubernetes-csi#14 from pohly/prow
7aaac22 prow.sh: remove AllAlpha=all, part II
6617773 Merge pull request kubernetes-csi#13 from pohly/prow
cda2fc5 prow.sh: avoid AllAlpha=true
546d550 prow.sh: debug failing KinD cluster creation
9b0d9cd build.make: skip shellcheck if Docker is not available
aa45a1c prow.sh: more efficient execution of individual tests
f3d1d2d prow.sh: fix hostpath driver version check
31dfaf3 prow.sh: fix running of just "alpha" tests
f501443 prow.sh: AllAlpha=true for unknown Kubernetes versions
95ae9de Merge pull request kubernetes-csi#9 from pohly/prow
d87eccb prow.sh: switch back to upstream csi-driver-host-path
6602d38 prow.sh: different E2E suite depending on Kubernetes version
741319b prow.sh: improve building Kubernetes from source
29545bb prow.sh: take Go version from Kubernetes source
429581c prow.sh: pull Go version from travis.yml
0a0fd49 prow.sh: comment clarification
2069a0a Merge pull request kubernetes-csi#11 from pohly/verify-shellcheck
55212ff initial Prow test job
6c7ba1b build.make: integrate shellcheck into "make test"
b2d25d4 verify-shellcheck.sh: make it usable in csi-release-tools
3b6af7b Merge pull request kubernetes-csi#12 from pohly/local-e2e-suite
104a1ac build.make: avoid unit-testing E2E test suite
34010e7 Merge pull request kubernetes-csi#10 from pohly/vendor-check
e6db50d check vendor directory
fb13c51 verify-shellcheck.sh: import from Kubernetes
94fc1e3 build.make: avoid unit-testing E2E test suite
849db0a Merge pull request kubernetes-csi#8 from pohly/subtree-check-relax
cc564f9 verify-subtree.sh: relax check and ignore old content

git-subtree-dir: release-tools
git-subtree-split: a0f195c
@obnoxxx
Copy link

obnoxxx commented May 12, 2021

@humblec :

Fix: #4

I think you mean "Fix: #44". 😄

@humblec
Copy link
Contributor Author

humblec commented May 17, 2021

@humblec :

Fix: #4

I think you mean "Fix: #44". smile

Sorry, I missed this comment. :)

@obnoxxx I meant #4 itself. This is a PR to make the driver deployment in Linux which is missing at the moment. This task has to be taken care to complete #4.

deploy/csi-iscsi-driverinfo.yaml Outdated Show resolved Hide resolved
deploy/rbac-csi-iscsi-controller.yaml Outdated Show resolved Hide resolved
deploy/install-driver.sh Outdated Show resolved Hide resolved
@humblec
Copy link
Contributor Author

humblec commented May 17, 2021

@andyzhangx addressed the comments. Thanks for the review, ptal
@msau42 would like to push some images for the deployment to k8s.gcr.io repo images. Any special permissions needed?

@msau42
Copy link
Collaborator

msau42 commented May 17, 2021

would like to push some images for the deployment to k8s.gcr.io repo images. Any special permissions needed?

We only promote release tagged images to k8s.gcr.io. Once we've tested the staging canary image, then we can cut a release, and then promote the tagged image.

@obnoxxx
Copy link

obnoxxx commented May 17, 2021

@humblec :

Fix: #4

I think you mean "Fix: #44". smile

Sorry, I missed this comment. :)

@obnoxxx I meant #4 itself. This is a PR to make the driver deployment in Linux which is missing at the moment. This task has to be taken care to complete #4.

Ugh, thanks for clarifying! I hadn't realized that this driver is not actually ready/released yet... 😄

@humblec
Copy link
Contributor Author

humblec commented May 18, 2021

@humblec :

Fix: #4

I think you mean "Fix: #44". smile

Sorry, I missed this comment. :)
@obnoxxx I meant #4 itself. This is a PR to make the driver deployment in Linux which is missing at the moment. This task has to be taken care to complete #4.

Ugh, thanks for clarifying! I hadn't realized that this driver is not actually ready/released yet... smile

Yep, more or less there is more work to do like #49 , E2E to get these things in shape for its initial version. :)

@humblec
Copy link
Contributor Author

humblec commented May 26, 2021

would like to push some images for the deployment to k8s.gcr.io repo images. Any special permissions needed?

We only promote release tagged images to k8s.gcr.io. Once we've tested the staging canary image, then we can cut a release, and then promote the tagged image.

Sure.. Thanks @msau42 .

The deployment part based on this PR from local image is good!

For ex:

[hchiramm@localhost csi-driver-iscsi]$ kubectl get all -n kube-system
NAME                       READY     STATUS    RESTARTS   AGE
pod/csi-iscsi-node-7gwkp   3/3       Running   0          15m
pod/csi-iscsi-node-8dvcj   3/3       Running   0          15m
pod/csi-iscsi-node-b26rh   3/3       Running   0          15m
NAME              TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                        AGE
service/kubelet   ClusterIP   None         <none>        10250/TCP,10255/TCP,4194/TCP   3h43m
NAME                            DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/csi-iscsi-node   3         3         3         3            3           kubernetes.io/os=linux   15m
[hchiramm@localhost csi-driver-iscsi]$ 

[hchiramm@localhost csi-driver-iscsi]$ kubectl get csidriver
NAME               ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES        AGE
iscsi.csi.k8s.io   false            false            false             <unset>         false               Persistent   16m
[hchiramm@localhost csi-driver-iscsi]$ 

[hchiramm@localhost csi-driver-iscsi]$ kubectl get pv,pvc
NAME                              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                                 STORAGECLASS   REASON    AGE
persistentvolume/pv-name   1Gi        RWO            Delete           Bound     test/iscsi-pvc                            3m22s

NAME                              STATUS    VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/iscsi-pvc   Bound     pv-name   1Gi        RWO                           4s
[hchiramm@localhost csi-driver-iscsi]$ 

@humblec
Copy link
Contributor Author

humblec commented May 26, 2021

@msau42 Is this #45 (comment) good enough to have a staging image and to follow rest of the image promotion process? or will we do some more experiments for completing the mount process and proceed further. I am yet to try that though

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: humblec

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2021
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
deploy/csi-iscsi-driverinfo.yaml Outdated Show resolved Hide resolved
deploy/csi-iscsi-node.yaml Outdated Show resolved Hide resolved
capabilities:
add: ["SYS_ADMIN"]
allowPrivilegeEscalation: true
image: quay.io/humble/csi-iscsi:v0.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can put in the k8s staging canary image for now. And before you tag a release, update this to the official repo and tag.

@msau42
Copy link
Collaborator

msau42 commented May 27, 2021

do some more experiments for completing the mount process and proceed further

what other work is there remaining before cutting a release?

@humblec humblec force-pushed the deploy branch 3 times, most recently from adfacc9 to 2039376 Compare May 31, 2021 11:32
@humblec humblec force-pushed the deploy branch 2 times, most recently from ab95a97 to 6cfc1c4 Compare May 31, 2021 11:34
@humblec
Copy link
Contributor Author

humblec commented May 31, 2021

do some more experiments for completing the mount process and proceed further

what other work is there remaining before cutting a release?

The mount part has not tested @msau42.

@humblec humblec force-pushed the deploy branch 3 times, most recently from 1be21b0 to 61d4648 Compare May 31, 2021 15:03
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
@humblec
Copy link
Contributor Author

humblec commented May 31, 2021

do some more experiments for completing the mount process and proceed further

what other work is there remaining before cutting a release?

The mount part has not tested @msau42.

Experimented on mount part and here is the summary:

kube-system   csi-iscsi-node-7w4c9               3/3     Running             0          4m25s

Driver details:

I0531 15:29:44.156799       8 utils.go:48] GRPC request: {}
I0531 15:29:44.156895       8 identityserver.go:16] Using default GetPluginInfo
I0531 15:29:44.157351       8 utils.go:53] GRPC response: {"name":"iscsi.csi.k8s.io","vendor_version":"1.0.0"}
[hchiramm@localhost csi-driver-iscsi]$ kubectl describe pod task-pv-pod -n default

The mount fails with an error from the library:

I0531 15:34:45.604036       8 utils.go:47] GRPC call: /csi.v1.Node/NodePublishVolume
I0531 15:34:45.604185       8 utils.go:48] GRPC request: {"target_path":"/var/lib/kubelet/pods/dd31590b-22fc-45d1-b77b-5b76a5bb1ad9/volumes/kubernetes.io~csi/static-pv-name/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":1}},"volume_context":{"discoveryCHAPAuth":"false","iqn":"iqn.2015-06.com.example.test:target1","iscsiInterface":"default","lun":"1","portals":"[]","targetPortal":"127.0.0.1:3260"},"volume_id":"hum-iscsi-share"}
I0531 15:34:45.604504       8 mount_linux.go:163] Detected OS without systemd
E0531 15:34:46.613172       8 utils.go:51] GRPC error: rpc error: code = Internal desc = connect reported success, but no path returned
[hchiramm@localhost csi-driver-iscsi]$ kubectl describe pod task-pv-pod -n default
Name:         task-pv-pod
Namespace:    default
Priority:     0
Node:         minikube/192.168.39.82
Start Time:   Mon, 31 May 2021 21:04:33 +0530
Labels:       <none>
Annotations:  <none>
Status:       Pending
IP:           
IPs:          <none>
Containers:
  task-pv-container:
    Container ID:   
    Image:          nginx
    Image ID:       
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from task-pv-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5qmrf (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  task-pv-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  iscsi-pvc
    ReadOnly:   false
  default-token-5qmrf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5qmrf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    49s                default-scheduler  Successfully assigned default/task-pv-pod to minikube
  Warning  FailedMount  10s (x7 over 48s)  kubelet, minikube  MountVolume.SetUp failed for volume "static-pv-name" : rpc error: code = Internal desc = connect reported success, but no path returned
[hchiramm@localhost csi-driver-iscsi]$ 

Above is coming from AttachDisk()

Considering this driver has not gone through any real testing, I was expecting the same. Will continue digging more on this and get back, however considering the plugin discovery and attempt to mount has initiated, I think we can go ahead with the deployment part which is covered in this PR @msau42 , wdyt?.

@humblec
Copy link
Contributor Author

humblec commented Jun 1, 2021

@pohly @msau42 is there any restrictions on the base image ( only alpine ..etc) we can use for the driver Dockerfile ?

@pohly
Copy link
Contributor

pohly commented Jun 1, 2021

is there any restrictions on the base image ( only alpine ..etc) we can use for the driver Dockerfile ?

Technically you can use whatever works. From a legal perspective it depends. I don't know if the CNCF has an official position on this, but at least Alpine might not be a good choice because it strips license texts (at least by default, if I remember correctly). This is a concern for some people.

Debian might be a better alternative. No trademarks to worry about, strong emphasis on license compliance, community driven, ...

@andyzhangx
Copy link
Member

@pohly @msau42 is there any restrictions on the base image ( only alpine ..etc) we can use for the driver Dockerfile ?

for base image, could refer to https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/Dockerfile#L17

@humblec
Copy link
Contributor Author

humblec commented Jun 1, 2021

is there any restrictions on the base image ( only alpine ..etc) we can use for the driver Dockerfile ?

Technically you can use whatever works. From a legal perspective it depends. I don't know if the CNCF has an official position on this, but at least Alpine might not be a good choice because it strips license texts (at least by default, if I remember correctly). This is a concern for some people.

Debian might be a better alternative. No trademarks to worry about, strong emphasis on license compliance, community driven, ...

Thanks @pohly for the insight!.

I have been spending good amount of time this week on this experiment as this iscsi depends a lot on the distribution of container and host versions, binary paths..etc.

Experiments were going on with centos, alpine base images..etc. will continue experiments on other distros as well.

@andyzhangx Thanks, sure, interestingly while I was looking for base images in various containers of kubernetes repo, I could see almost all distros in place. :),

@msau42
Copy link
Collaborator

msau42 commented Jun 1, 2021

For reference, discussion on why k8s chose debian as its base image

@humblec
Copy link
Contributor Author

humblec commented Jun 2, 2021

Eventhough I am making good progress by fixing issues one by one, it looks like I have to continue this process some more time to get this in shape :). Lets keep this PR open while these experiments are going on.

@humblec
Copy link
Contributor Author

humblec commented Jun 3, 2021

Finally, able to get the mount working.. 👍


I0603 07:11:36.907283       7 mount_linux.go:405] Attempting to determine if disk "/dev/disk/by-path/ip-10.70.53.171:3260-iscsi-iqn.2015-06.com.example.test:target1-lun-1" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/disk/by-path/ip-10.70.53.171:3260-iscsi-iqn.2015-06.com.example.test:target1-lun-1])
I0603 07:11:36.925643       7 mount_linux.go:408] Output: "DEVNAME=/dev/disk/by-path/ip-10.70.53.171:3260-iscsi-iqn.2015-06.com.example.test:target1-lun-1\nTYPE=ext4\n", err: <nil>
I0603 07:11:36.925721       7 mount_linux.go:298] Checking for issues with fsck on disk: /dev/disk/by-path/ip-10.70.53.171:3260-iscsi-iqn.2015-06.com.example.test:target1-lun-1
I0603 07:11:37.028261       7 mount_linux.go:394] Attempting to mount disk /dev/disk/by-path/ip-10.70.53.171:3260-iscsi-iqn.2015-06.com.example.test:target1-lun-1 in ext4 format at /var/lib/kubelet/pods/e64ff6a0-fc0a-4f32-a947-04cd873974ce/volumes/kubernetes.io~csi/static-pv-name/mount
I0603 07:11:37.028292       7 mount_linux.go:146] Mounting cmd (mount) with arguments (-t ext4 -o rw,defaults /dev/disk/by-path/ip-10.70.53.171:3260-iscsi-iqn.2015-06.com.example.test:target1-lun-1 /var/lib/kubelet/pods/e64ff6a0-fc0a-4f32-a947-04cd873974ce/volumes/kubernetes.io~csi/static-pv-name/mount)
I0603 07:11:37.086370       7 utils.go:53] GRPC response: {}
I0603 07:11:41.058886       7 utils.go:47] GRPC call: /csi.v1.Identity/Probe
I0603 07:11:41.058968       7 utils.go:48] GRPC request: {}
I0603 07:11:41.059147       7 utils.go:53] GRPC response: {}
[root@dhcp53-171 csi-driver-iscsi]# kubectl get pods |grep task
task-pv-pod                                                       1/1     Running             0          39s
[root@dhcp53-171 csi-driver-iscsi]# 

Unmount is failing though, looking into the same.

@humblec
Copy link
Contributor Author

humblec commented Jun 3, 2021

DetachDisk code flow:

unmount -> then reading the json from targetPath -> then building the connector again.

	if err = c.mounter.Unmount(targetPath); err != nil {
		klog.Errorf("iscsi detach disk: failed to unmount: %s\nError: %v", targetPath, err)
		return err
	}
	cnt--
	if cnt != 0 {
		return nil
	}

	// load iscsi disk config from json file
	file := path.Join(targetPath, c.iscsiDisk.VolName+".json")
	connector, err := iscsiLib.GetConnectorFromFile(file)
	if err != nil {
		klog.Errorf("iscsi detach disk: failed to get iscsi config from path %s Error: %v", targetPath, err)
		return err
	}

	iscsiLib.Disconnect(connector.TargetIqn, connector.TargetPortals)

The unmount fails while mounter execute detach disk due to the pod volume directory is not empty.



Jun  3 12:45:27 localhost kubelet[902]: I0603 12:45:27.489050     902 reconciler.go:196] operationExecutor.UnmountVolume started for volume "task-pv-storage" (UniqueName: "kubernetes.io/csi/iscsi.csi.k8s.io^hum-iscsi-share") pod "e64ff6a0-fc0a-4f32-a947-04cd873974ce" (UID: "e64ff6a0-fc0a-4f32-a947-04cd873974ce")

Jun  3 12:45:27 localhost kubelet[902]: E0603 12:45:27.558572     902 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/iscsi.csi.k8s.io^hum-iscsi-share podName:e64ff6a0-fc0a-4f32-a947-04cd873974ce nodeName:}" failed. No retries permitted until 2021-06-03 12:45:28.058453563 +0530 IST m=+6203994.880810285 (durationBeforeRetry 500ms). Error: "UnmountVolume.TearDown failed for volume \"task-pv-storage\" (UniqueName: \"kubernetes.io/csi/iscsi.csi.k8s.io^hum-iscsi-share\") pod \"e64ff6a0-fc0a-4f32-a947-04cd873974ce\" (UID: \"e64ff6a0-fc0a-4f32-a947-04cd873974ce\") : kubernetes.io/csi: mounter.TearDownAt failed to clean mount dir [/var/lib/kubelet/pods/e64ff6a0-fc0a-4f32-a947-04cd873974ce/volumes/kubernetes.io~csi/static-pv-name/mount]: kubernetes.io/csi: failed to remove dir [/var/lib/kubelet/pods/e64ff6a0-fc0a-4f32-a947-04cd873974ce/volumes/kubernetes.io~csi/static-pv-name/mount]: remove /var/lib/kubelet/pods/e64ff6a0-fc0a-4f32-a947-04cd873974ce/volumes/kubernetes.io~csi/static-pv-name/mount: directory not empty"

So the metadata file residing inside the pod volume mount which cause the unmount to fail.

In this case, the pod mount is done with bidirectional mount propagation:

        - name: pods-mount-dir
          hostPath:
            path: /var/lib/kubelet/pods
            type: Directory
    .......
        - name: pods-mount-dir
             mountPath: /var/lib/kubelet/pods
             mountPropagation: "Bidirectional"

Trying to figure out why we have unmount failure here ? the introduction of share metadata file was to reconstruct the disconnect details for cleanup or reliable nodeunpublish. But in general flow of events of unmount called at DetachDisk() time , this is failing saying unmount is attempted on a non empty directory as seen above which is confusing. OR the failure here is due to some other bind mount inside this path ?

@j-griffith @msau42 @pohly @andyzhangx @jsafrane any thoughts/pointers on this ?

@humblec
Copy link
Contributor Author

humblec commented Jun 9, 2021

Some more experiments were carried out without the metadata file inside the path by removing the persistConnector() calls from the driver to roll out the other possibilities of unmount failure. In absence of it, the nodeUnpublish() complete successfully:

I0609 12:04:08.200136       8 utils.go:47] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0609 12:04:08.200153       8 utils.go:48] GRPC request: {"target_path":"/var/lib/kubelet/pods/57b8e330-ed76-4350-8fb6-d7a07fedd6b7/volumes/kubernetes.io~csi/static-pv-name/mount","volume_id":"hum-iscsi-share"}
I0609 12:04:08.200286       8 mount_linux.go:163] Detected OS without systemd
I0609 12:04:08.201834       8 mount_linux.go:238] Unmounting /var/lib/kubelet/pods/57b8e330-ed76-4350-8fb6-d7a07fedd6b7/volumes/kubernetes.io~csi/static-pv-name/mount
I0609 12:04:08.207536       8 utils.go:53] GRPC response: {}

So, this kind of confirm the unmount failure is caused by the presence of metadata file and not because of some other bind mounts in the same path. However we want this metadata for successful complete disconnect of the share from the kubelet/node in a later code path.

We have seen this kind of behaviour with RWO and blockMode PVCs , but in this case it is RWO + Filesystem Mode PVC which confuse me. @gnufied has fixed this issue to an extend, however I believe some of the corner cases ( block mode cases) were left out:

kubernetes/kubernetes#82190
kubernetes/kubernetes#82340
container-storage-interface/spec#385

Any help is appreciated as I see this is the final blocker for this driver to be in a consumable state.

@msau42
Copy link
Collaborator

msau42 commented Jun 15, 2021

I'm not sure I completely understand your question, but if you want the metadata file for NodeUnstage, then it should go under the staging directory, not the publish path.

@humblec humblec mentioned this pull request Sep 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 13, 2021
@humblec
Copy link
Contributor Author

humblec commented Sep 13, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 13, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 3, 2021
@k8s-ci-robot
Copy link
Contributor

@humblec: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@humblec
Copy link
Contributor Author

humblec commented Dec 9, 2021

This issue has been addressed via #70 .. closing this PR for the same reason.

@humblec humblec closed this Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create initial release of csi-driver-iscsi
7 participants