- You’ll need to use this model and training technique (MNIST Hogwild): https://github.com/pytorch/examples/tree/main/mnist_hogwildLinks to an external site.
- Set Num Processes to 2 for MNIST HogWild
- Create three services in the Docker Compose file: train, evaluate, and infer.
- Use a shared volume called mnist for sharing data between the services.
- The train service should: Look for a checkpoint file in the volume. If found, resume training from that checkpoint. Train for ONLY 1 epoch and save the final checkpoint. Once done, exit.
- The evaluate service should: Look for the final checkpoint file in the volume. Evaluate the model using the checkpoint and save the evaluation metrics in a json file. Once done, exit.
- Share the model code by importing the model instead of copy-pasting it in eval.pyLinks to an external site.
- The infer service should:
- Run inference on any 5 random MNIST images and save the results (images with file name as predicted number) in the results folder in the volume. Then exit.
- After running all the services, ensure that the model, and results are available in the mnist volume.
- Since we are going to use docker compose its better to create a common
model
folder to storemodel
in the root and create seperate folders for each service and place their files in those folders. - Write the
train.py
,eval.py
,infer.py
and test it first itself. - After the scripts are ready, we need to mount the shared volume properly. Here we have used
mnist
in docker compose as a shared volume. Make sure youname
the volume else it will take the default path value as prefix for volume name. - Since we need both
model
folder and the common volumemnist
we need to mount two volumes for each service while running. - Using
docker compose
run the train service withprocess=2
command and then run the eval service and then the infer service. - if you have mounted properly the output files would have been available in the shared folder. You can verify using below command
/opt/mount
location.docker run --rm -it -v mnist:/opt/mount/model alpine /bin/sh
- In the default code it was generating inference images with class id's so we need to change to index numbers to get 5 images as output.
- Name the volume properly and mount the volume properly.
- We can even mount more than one volume to a service and each service docker files can be placed under seperate folder for better readabilty and management .
- Ajith Kumar V (myself)
- Aakash Vardhan
- Anvesh Vankayala
- Manjunath Yelipeta
- Abhijith Kumar K P
- Sabitha Devarajulu