-
You will have to go to the CSE Dept. office in the new building to get access to the GPU cluster.
-
Tell them your requirements and they will email you the instructions along with your credentials in a day or two.
-
Mostly, the instructions provided in the mail are sufficient.
-
You will need to use SSH (and SFTP if you wish), protocols used to access remote machines.
You can google how to install SSH on your Linux system, depending on the distribution you are using.
Next, you can enter the remote system using ssh username@ip_address
from the terminal.
SFTP is used to get a user-friendly GUI interface in the form of a standard file browsing window.
On windows, you can download and use PuTTY
This is required when the default image does not contain a program that you want to use.
-
Make an account on Dockerhub.
-
Link your Dockerhub account to your Github account.
-
Make a new repo on Dockerhub and Github.
-
Open the new repo on Dockerhub and link it to the Github repo to enable the autobuild.
-
Write the Dockerfile and save locally: For this, you can learn how to write a Dockerfile by a simple google search, or else, simply download the Dockerfile from this repository and make required changes.
-
There are two methods to do this, but I doubt anyone uses method 2:
Push the Dockerfile to the Github repo: The corresponding Docker image will start building in the Dockerhub automatically. You can verify the same by opening the repo in Dockerhub.
In the directory where the Dockerfile is present in the local system, enter the following commands:
-
docker build -t <tag> .
-
docker login -u <docker_username>
-
docker tag <tag> <docker_username>/<docker_repo>:<tag>
-
docker push <docker_username>/<docker_repo>:<tag>
-
After step 6 is complete, in the Dockerhub repo, you can see the final docker image under "Tags".
-
On the top right corner of the image, there will be the text to be copied of the form
docker pull <docker_username>/<docker_repo>:latest
. -
<docker_username>/<docker_repo>:latest
is the link to the final image. Copy the same, and replace the default image link (abhipec/iitg_kubernetes:p3_master_image_latest
) in the .yaml file (probablypod-creation_direct_execution.yaml
) by your link. -
You are good to go!
-
One common mistake is that all file paths need to be updated. The file paths need to be absolute, and the default mount path is
/data/
. Thus, if I create a filetest.txt
in the same directory that opens after ssh, the location of the file in code should be/data/test.txt
-
Remember that the CSE Dept. GPU drivers are not updated currently, hence, some updated Docker images might not work. To tackle this issue, you may go to the Dockerhub repo of that particular program, and choose the docker image of an older version on your Dockerfile.
For eg. tensorflow/tensorflow:latest-gpu-py3
image doesn't seem to work. Hence, use tensorflow/tensorflow:custom-op-ubuntu16
instead.