Skip to content

Commit 4b6e7af

Browse files
committed
Added Fixes for CUDA 10.2 Support.
1 parent d0bb44f commit 4b6e7af

File tree

3 files changed

+21
-16
lines changed

3 files changed

+21
-16
lines changed

Readme.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,3 +170,9 @@ export KOPS_STATE_STORE=s3://${YOUR_CLUSTER_KOPS_STATE_STORE}
170170
171171
- Once Done Executing :
172172
- Once the ``kubectl port-forward deployment/metaflow-metadata-service 8080:8080`` to port forward metatdata service for accesss on localmachine. Please note that because this is directly port forwarding to the pod were are taking the 8080 port for the service.
173+
174+
175+
# TODO
176+
177+
- [ ] Integrate Minio Helm chart to this.
178+
-

gpu.md

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,18 @@ kubectl exec -it tf-gpu -- \
2525

2626
## Specs
2727
- Kops using the [gpu_setup/gpu_instance.yml](gpu_setup/gpu_instance.yml) file to Configure the GPU Instances on AWS joininig the Cluster.
28-
- Constraints :
29-
- Cuda Libraries v9.1
30-
- Docker 18.x on Machine
31-
- Kubernetes Version 1.15.x, 1.16.x
32-
33-
- NO CUDA 10.2 Support :
34-
- KOPS Currently Only Support Kubernetes v1.16
35-
- K8s v1.16 which uses Docker v18.03.
36-
- K8s v1.17 Support Docker 19.03.
37-
- [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker#quickstart) Requires Docker 19.03 and supports CUDA 10.2.
38-
- The Older version of this was [nvidia-docker2](https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(version-2.0)) which supported Docker 18.03 and 19.03
39-
- KOPS Supports NVIDIA-Device-Plugin deployments with [nvidia-docker2](https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(version-2.0)) and hence currently has ongoing issues on support for New CUDA Versions.
40-
- KOPS Needs to move to v1.17 of kubernetes to start quick deployments Kubernetes versions which can support Docker 19.03 which inturn will support Latest Nvidia CUDA Toolkit.
28+
- Tested ON :
29+
- Kubernetes Version 1.15.x, 1.16.x
30+
31+
- Cuda Libraries v10.2. [Credits](https://github.com/elevate/nvidia-device-plugin)
32+
33+
- To use Cuda 9.1 change the below in [gpu_setup/gpu_instance.yml](gpu_setup/gpu_instance.yml)
34+
35+
```yml
36+
hooks:
37+
- execContainer:
38+
image: dcwangmit01/nvidia-device-plugin:0.1.0
39+
```
4140
4241
## Cleanup Tasks
4342
@@ -48,8 +47,8 @@ kops delete ig gpu-nodes
4847

4948

5049
## TODO
51-
- [ ] Test the Base AMI for KOPS deployment with NVIDIA Provided AMI.
52-
- [ ] Test Cuda Support for v9.1 , v9.2
50+
- [x] Test Cuda Support for v10.2, 9.1
51+
5352

5453
## References
5554
- https://docs.nvidia.com/datacenter/kubernetes/kubernetes-upstream/index.html

gpu_setup/gpu_instance.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ spec:
88
image: kope.io/k8s-1.10-debian-stretch-amd64-hvm-ebs-2018-05-27
99
hooks:
1010
- execContainer:
11-
image: dcwangmit01/nvidia-device-plugin:0.1.0
11+
image: valaygaurang/nvidia-device-plugin:tesla-440.64.00
1212
machineType: p2.xlarge
1313
maxSize: 1
1414
minSize: 1

0 commit comments

Comments
 (0)