ESI-DCAFM-TACO-VDSP Summer School
11.07 - 22.07. 2022 in Vienna, Austria, official website here.
Date: 12.07.2022, 14:00 - 17:00
Tutor: Pavol Harar (orcid)
Title: A tool a day keeps the bad review away
Slides: presentation.pdf
The main aim of this session is to have a hands on experience with (some of) the tools presented in the slides. This session is not focused on data/modeling, etc. but rather on the technical aspects around your ML projects. Due to the number of participants, it is not possible to help everybody so in case you feel overwhelmed, do not feel guilty to just follow the presentation.
The following material builds on a Jupyter notebook in which a model was previously trained. The pre-trained model(s) are assumed to be stored and ready to use. In total, there are 7 exercises. Each of the exercises contains a rough (on purpose non-complete) guide to complete the assignment. Often, the project files themselves are the solution, so you can help yourself by peeking into them whenever you feel stuck. All the commands bellow are suited for Ubuntu operating system but links to relevant resources are provided for users of other operating systems.
By the end of the day, you should be able to:
- run a machine learning project within a virtual environment on your own computer
- put your code into a git repository like this one
- specify, build, and run a Docker container which runs your project
- track your experiments in a nice and useful web interface
- wrap your project into an interactive web application for common users
- configure your project to run on Binder, e.g. for reviewers of your paper
- deploy the web application such that it is available from the Internet
The assignment: Run a Jupyter notebook server from within a virtual environment. In case you are more comfortable with or prefer Conda, feel free to use it instead of virtualenv.
- Create a new
MLSummerSchoolVienna2022folder (this will be the root of this project). - Change director to your newly created project root folder.
- Install
virtualenvorminioconda. - Download the
notebook.ipynbfile from this repository. - (virtualenv) Create a new empty file called
requirements.txt - (conda) Create a new empty file called
requirements.yml - Fill it the requirements file with dependencies of
notebook.ipynb+jupyterandvoila(or just download it from the repository). - (virtualenv) Create a virtual environment in
venvfolder and install the dependencies from requirements file. - (conda) Recreate the virtual environment from using the requirements file.
- Activate the virtual environment.
- Run the
jupyter notebookserver. - Run the cell 1 to see whether all imports work.
- Create a readme.md file with just a name of the project.
- Optionally, train the models and save them as files.
The assignment: Create a new repository for this project and push an initial commit into it.
- On Ubuntu
sudo apt install git-all
For other OS consult the user guide. We will use git without a graphical user interface (so on Windows, please use Git Bash emulator which should be installed automatically with git).
- Create an account on github.com.
- Make sure you have an SSH key generated. If not, generate it using this guide.
- Go to
github.com > settings > SSH and GPG keys. - Copy your public SSH key and add it to your GitHub keys.
On Ubuntu you can copy your key usingcat ~/.ssh/id_rsa.pub. - Verify the SSH key authentication works with
ssh -T git@github.com. - In case of different OS or some problems consult the GitHub guide.
- Go to github.com and create a new repository called
MLSummerSchoolVienna2022(change the name of the repository to your liking but do not forget to change it in some of the commands bellow). - Do not check the automatic creation of readme, license or other files.
- Go to your project root folder.
- Create
.gitignorefile withvenv,.ipynb_checkpointsin it. - Run
git init. - Run
git remote add origin git@github.com:<your_username>/MLSummerSchoolVienna2022.git. - If you use
gitfor the first time, you might be asked to configure your user name and commit email address with:- Run
git config --global user.name "Your Name". - Run
git config --global user.email "Your Name".
- Run
- Run
git add .. - Run
git commit -m "Initial commit". - Run
git push origin main. - Now your changes should be visible in your repository on github.com.
- In case your HEAD branch is not called
mainbutmaster, change the commands accordingly to avoid problems.
The assignment: Run a Jupyter notebook server in a Docker container.
- If you use different OS than Ubuntu, check Docker installation guide.
- On Ubuntu install with
sudo apt install docker.io - Check if it is installed correctly with
sudo docker run hello-world
- Run
sudo docker pull intelliseqngs/ubuntu-minimal-20.04:3.0.5. - Add a file
.dockerfileinto your project. (Here we use a nonstandard name for a reason that we actually do not want Binder and Heroku to use our Dockerfile.) - Base your
.dockerfileonintelliseqngs/ubuntu-minimal-20.04:3.0.5. - Fill the
.dockerfilewith commands to copy and install your project. - Reference on writing the Dockerfile is here.
- In case you have problems, consult the solution in .dockerfile.
- Run
sudo docker build -t mlssv2022:latest -f .dockerfile ..
If you have a problem with DNS, try restarting docker withsudo pkill dockerandservice docker restart. - In case your Docker errors on "killed" Adjust Docker Preferences Resources RAM - make it bigger, i.e. 4 or 6GB in the settings of your Docker.
- Run
sudo docker run --rm -p 8888:8888 mlssv2022:latest jupyter notebook --allow-root --ip 0.0.0.0.-pforwards port 8888 of the container to 8888 on the host--allow-rootsince all in the container runs as root--ip 0.0.0.0expose the jupyter server so host can see it
- Visit
localhost:8888in your browser and copy the token, the jupyter notebook should now run.
The assignment: Run and explore the wandb examples.
Optional assignment: Adjust notebook.ipynb such that training is tracked in wandb.
- Create an account at wandb.ai.
- Log in to your account and try the Example (wandb.me/intro) and run it until the "Run experiment" cell finishes.
- Check the results in the wandb.ai account.
- Check also these examples https://github.com/wandb/examples.
- If you feel motivated, open one of the Google Colab notebooks from Monday's tutorial and change it such that it tracks the training into wandb, and view the results in the web interface.
The assignment: Create a simple interactive webapp using ipywidgets and Voila.
Optional assignment: Make the app run in Docker container.
- Go back to the notebook which is running in virtual environment.
- Create a new python3 notebook called
webapp.ipynb. - Make sure you trained the model or downloaded the pre-trained model(s) from the repository to the project root.
- Build a simple interactive webapp using ipywidgets which allows the user to input data (e.g. as an URL to a file), then loads a pre-trained model, and finally it computes and displays the prediction to the user.
- Click on Voila button in the Jupyter notebook menu to test whether everything runs as a web app.
- If you wish to have a functional Docker container with a webapp inside, update the
.dockerfileto include web app related files (webapp.ipynband pre-trained model(s)) and rebuild your Docker image. - Push your changes to git.
The assignment: Make your project run for free using myBinder.org.
- Make your git repo public if it is not already.
- Go to mybinder.org and fill in the form:
- Repository URL:
https://github.com/<your_username>/mlssv2022 - Git ref:
main(ormasterdepending on your repo) - Path to a URL (not a file):
/voila/render/webapp.ipynb
- Repository URL:
- Copy the binder markup badge into your
readme.md. - Wait for app to run in Binder. It will take quite some time, but Binder is a free service, so...
- Check how to use Docker with binder if needed here.
- When the app runs, have fun... You can try this image for example.
*An example badge to run the webapp from this repository on Binder is bellow the title of this exercise. Try to click it.
The assignment: Deploy your project to a free instance on Heroku.
- Create a free account on Heroku. It might still ask you to fill in your credit card though.
- Add
Procfilewithweb: voila webapp.ipynb --no-browser --port $PORT. - Add
runtime.txtinto project folder withpython-3.8.10. - Push changes to the repo.
- Install Heroku cli by following the official guide.
- Deploy your app to heroku using git:
- Run
heroku updateto make sure Heroku cli is up to date - Run
heroku createto create a new Heroku app - Run
git push heroku masterto deploy. - Optionally set
heroku ps:scale web=1. - Openy your app with
heroku open.
- Run
- Or follow the deployment guide directly from Voila.