This repository holds code to create Saturn Docker images.
A default image is defined as an image that, upon a fresh customer install, is immediately available to be attached to a Jupyter server or Dask cluster.
All default images should have at least the following packages with appropriate pins, floors, or ceilings. This ensures customers will be able to use Dask, Prefect, and Snowflake in every image.
name: saturn
channels:
- defaults
- conda-forge
dependencies:
- blas=*=mkl
- bokeh
- dask-ml
- dask
- distributed
- ipykernel
- ipywidgets
- matplotlib
- numpy
- pandas
- pip
- prefect
- pyarrow
- python=3.7
- python-graphviz
- s3fs
- scikit-learn
- scipy
- voila
- xgboost
- pip:
- dask-saturn
- prefect-saturn
- snowflake-connector-python
We need to keep images as small as possible, because image size directly impacts instance spinup time.
- saturn: Data analysis, machine learning, and parallel processing with Dask
- saturn-rapids: GPU-acceleration with RAPIDS (GPU instance recommended)
- saturn-tensorflow: Deep learning with tensorflow (GPU instance recommended)
- saturn-pytorch: Deep learning with pytorch (GPU instance recommended)
- examples-cpu: For running examples-cpu project
- examples-gpu: For running examples-gpu project (GPU instance recommended)
- saturn-geospatial: Geospatial IO, analysis and visualization
Each image is stored in its own subdirectory. That subdirectory should have at least a Dockerfile
and .dockerignore
.
Dockerfile
A script that defines how to build the image.
For complete details on how to write .dockerignore
files, see the official docker documentation.
.dockerignore
Similar to .gitignore
, .dockerignore
is used to prevent unwanted files from being bundled in an image. For a good explanation of this, see "Do Not Ignore .dockerignore".
The images in this repository use .dockerignore
files like this:
*
!app.py
!environment.yml
That syntax says "ignore everything EXCEPT app.py
and environment.yml
".
For complete details on how to write .dockerignore
files, see the docker documentation.
There are two R base images. Each installs R and RStudio (and Python), and sets up the right environment variables including Reticulate support.
saturnbase-rstudio
- Built in the same manner assaturnbase
but without JupyterLab.saturnbase-rstudio-gpu-11.1
- Built using rocker/ml as the starting point so that GPUs can be supported. The rocker/ml image is copyright of the rocker project.
The R images that build from these two base images can then add R packages or Python packages as needed.