Deployment information.
Using the power of cookiecutter, this single command provides a pretty solid starting point for any new project.
cookiecutter gh:eliavw/cookiecutter-datascience
Version control goes without saying. For the local repository, do;
git init
git add .
git commit -m "First commit"
For the remote repository, do;
git remote add origin git@github.com:eliavw/also_anomaly_detector.git
git remote -v
git push origin master
And that's it for git.
We can summarize the above procedure in two one-liners, should you really care about doing this fast.
git init; git add .; git commit -m "First commit";
and
git remote add origin git@github.com:eliavw/also_anomaly_detector.git; git remote -v; git push origin master
This cookiecutter is set up for optimal use with conda, for local dependency managment. The takeaway is this; for local dependency managment, we rely on conda and nothing else.
Note that this has nothing to do with remote dependency managment. This is what you need to take care of when preparing a release of your code which goes via PyPi or alternatives. We treat that as an independent problem. Mixing remote and local dependency managment tends to add complexity instead of removing it.
To create our default environment, do:
conda env create -f dependencies-deploy.yaml -n also_anomaly_detector
To additionally add the packages which are relevant for the development phase, do:
conda activate also_anomaly_detector
conda env update -n also_anomaly_detector -f dependencies-develop.yaml
To add your isolated python installation (i.e., the one in your new conda environment) to the list of "kernels" found by Jupyter, execute the following.
conda activate also_anomaly_detector
python -m ipykernel install --user --name also_anomaly_detector --display-name "also_anomaly_detector"
One fundamental assumption is the following;
All code in this repository belongs to one of the two following categories: source code or executable scripts.
- Code in src is considered source code. It composes a Python package.
- Code in scripts or note acts as standalone scripts.
This means that even our own code has to be installed before we are able to use it. This seems a bit tedious but has some important advantages too:
- Our scripts will consider our own algorithm(s) and external competitors both as packages to be imported. Putting these on equal footing enforces code quality (e.g., modularity, API-design, ...) and reproducibility.
- If we build it like a package from the start on our local machine, the transition to an actual publishable package will be a lot smoother afterwards. In other words, we will try to get it right from the start.
To install, activate the conda environment and execute this line of code.
python setup.py develop # or `install`
What is the difference between develop
or install
? When you install the package with the develop
flag, symlinks are created from your code to the python installation. That means that every time you change something in your codebase, the installed package in your python environment will also change. Typically, this is what you would want: to see your changes reflected immediately.
The install option just copies your code as it is at time of installation and install the package in the python environment. This mimics what a third party would do.
Do not allow yourself to proceed without at least accumulating some tests. Therefore, we've set out to intigrate CI (i.e. Travis) right from the start.
Follow these steps:
- Go to the Travis page of this repo.
- See if it ran.
Note: The tests depend on our local dependency managment. Why? Because we have full control of the Travis servers running our tests. Therefore, we can simply treat it as a computer we control. We only need to fall back on remote dependency managment if other people need to get our code up and running, without our intervention.
This part is about publishing your project on PyPi.
Make your project publicly available on the Python Package Index, PyPi. To achieve this, we need remote dependency managment, since you want your software to run without forcing the users to recreate your conda environments. All dependencies have to be managed, automatically, during installation. To make this work, we need to do some extra work.
We follow the steps as outlined in the most basic (and official) PyPi tutorial.
Generate distribution packages for the package. These are archives that are uploaded to the Package Index and can be installed by pip.
python setup.py sdist bdist_wheel
After this, your package can be uploaded to the python package index. To see if it works on PyPi test server, do
python -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*
and this will prompt some questions, but your package will end up in the index.
To make your package end up in the actual PyPi, the procedure is almost as simple, do
python -m twine upload --repository-url https://pypi.org/legacy/ dist/*
Every good open source project at least consists of a bit of documentation. A part of this documentation is generated from decent docstrings you wrote together with your code.
We will use Mkdocs, with its material theme. This generates very nice webpages and is -in my humble opinion- a bit more modern than Sphinx (which is also good!).
The main upside of mkdocs
is the fact that its source files are markdown, which is the most basic formatted text format there is. Readmes and even this deployment document are written in markdown itself. In that sense, we gain consistency, all the stuff that we want to communicate is written in markdown:
- readme's in the repo
- text cells in jupyter notebooks
- source files for the documentation site
Which means that we can write everything once, and link it together. All the formats are the same, hence trivially compatible.
The cookiecutter already contains the mkdocs.yml file, which is -unsurprisingly- the configuration file for your mkdocs project. Using this cookiecutter, you can focus on content. Alongside this configuration file, we also included a demo page; index.md, which is the home page of the documentation website.
For a test drive, you need to know some commands. To build your website (i.e., generate html starting from your markdown sources), you do
mkdocs build
To preview your website locally, you do
mkdocs serve
and surf to localhost:8000. Also note that this server will refresh whenever you alter something on disk (which is nice!), and hence does the build command automatically.
Now, the last challenge is to make this website available over the internet. Luckily, mkdocs makes this extremely easy when you want to host on github pages
mkdocs gh-deploy
and your site should be online at; https://eliavw.github.io/also_anomaly_detector/.
What happens under the hood is that a mkdocs build
is executed, and then the resulting site
directory is pushed to the gh pages
branch in your repository. From that point on, github takes care of the rest.
Often overlooked, but this is right on top of your repository and hence the absolute perfect place to link to your project website. Hence, a cookiecutter-generated sentence to put there would be;
Anomaly Detection according to ALSO, cf. https://eliavw.github.io/also_anomaly_detector
Reproducible containers in their most popular form.
Reproducible containers which can be run on HPC infrastructures.