Skip to content

reichlab/operational-models

Repository files navigation

operational-models

Repository for automating runs of operational disease forecasting models.

Docker instructions

This project supports containerizing its models via reusable Dockerfile and run.sh files. This works by passing various environment variables to docker build and docker run commands as documented below. The basic steps for containerizing a new model are:

To build the image

Environment variables: Building the Dockerfile for a particular model uses the following environment variables:

  • (required) MODEL_DIR: specifies the directory name (not full path) of the model being built. Example: MODEL_DIR=flu_ar2.

Example build command:

cd "path-to-this-repo"
docker build --build-arg MODEL_DIR=flu_ar2 --tag=flu_ar2:1.0 --file=Dockerfile .

To run the image locally

Environment variables: There are two sources of environment variables used by this repo's containerization approach:

  1. We use reichlab/container-utils to manage variables for GitHub credentials and Slack integration (messages and uploads). It requires the following variables (please see the repo's README.md for details):
    • SLACK_API_TOKEN, CHANNEL_ID (required): used by slack.sh
    • GH_TOKEN, GIT_USER_NAME, GIT_USER_EMAIL, GIT_CREDENTIALS (required): used by load-env-vars.sh
    • DRY_RUN (optional): when set (to anything), stops git commit actions from happening (default is to do commits).
  2. This repo's run.sh is parameterized to work with this repo's different models, so running the Dockerfile for a particular model uses the following environment variables. These can be passed via docker run's --env or --env-file args.
    • MODEL_NAME (required): Hub name of the model (i.e., the name used in model outputs). Example: MODEL_NAME=UMass-AR2
    • REPO_URL (required): Full URL of the repository being cloned, excluding ".git". Example: REPO_URL=https://github.com/reichlab/FluSight-forecast-hub
    • REPO_UPSTREAM_URL (required): Full URL of the repository that REPO_URL was forked from, excluding ".git". Example: REPO_UPSTREAM_URL=https://github.com/cdcepi/FluSight-forecast-hub
    • MAIN_PY_ARGS (optional): Specifies arguments that are passed through to run.sh's call to the particular model's main.py. Note that these arguments are model-specific. For example, the flu_flusion model accepts two args: MAIN_PY_ARGS=--today_date=2024-11-27 --short_run=True whereas the flu_ar2 model accepts only the former arg.

Example run command:

docker run --rm \
  --env-file path_to_env_file/git-and-slack-credentials.env \
  --env MODEL_NAME="UMass-AR2" \
  ... \
  --env DRY_RUN=1 \
  flu_ar2:1.0

To publish the image

Use the following commands to build and push an image. These use the flu_ar2 model as an example.

Note: We build for the amd64 architecture because that's what most Linux-based servers (including AWS) use natively. This is as opposed to Apple Silicon Macs, which have an arm64 architecture. Note: For Macs with Apple silicon chips as of this writing, specifying --platform=linux/amd64 causes the build to fail unless you disable Rosetta in Docker Desktop. For details, see Buildx throws Illegal Instruction installing ca-certificates when building for linux/amd64 on M2 #7255.

cd "path-to-this-repo"
docker login -u "reichlab" docker.io
docker build --platform=linux/amd64 --build-arg MODEL_DIR=flu_ar2 --tag=reichlab/flu_ar2:1.0 --file=Dockerfile .
docker push reichlab/flu_ar2:1.0

requirements.txt and renv.lock details

Each model has different R and Python library requirements. These are captured via Python requirements.txt and renv renv.lock files that are stored in each model's subdirectory. Following is how to create these.

requirements.txt

Python dependencies are managed with a two-file approach using pip-compile [1] (from pip-tools).

File Purpose Edit?
requirements.in Direct dependencies only Yes — by hand
requirements.txt Fully pinned lockfile (all transitive deps) No — generated
  • requirements.txt is committed to the repo and used by Docker (pip install -r requirements.txt).
  • requirements.in uses the .in convention from pip-tools, indicating it is the input to a compile step.

Regenerating requirements.txt

First, create a new venv and then ensure pip-tools is installed in the project's virtual environment (one-time setup). Note: Here, python3 is assumed to resolve to the appropriate version (say via pyenv + .python-version):

cd "path-to-this-repo"
python3 -m venv .venv
.venv/bin/python -m ensurepip --upgrade
.venv/bin/python -m pip install pip-tools

Then run from the following command from the repo root after editing requirements.in or to refresh pinned versions, where <app> is one of covid_ar6_pooled, covid_gbqr, flu_ar2, flu_flusion, or flu_trends_ensemble:

.venv/bin/pip-compile <app>/requirements.in --output-file <app>/requirements.txt

To regenerate all apps at once:

for app in covid_ar6_pooled covid_gbqr flu_ar2 flu_flusion flu_trends_ensemble; do
  .venv/bin/pip-compile "$app/requirements.in" --output-file "$app/requirements.txt"
done

Workflow:

  1. Add or change a direct dependency in requirements.in
  2. Regenerate requirements.txt with the command above
  3. Commit both files

[1] A note re: tooling: We wanted to use https://github.com/astral-sh/uv to generate requirements.txt files from requirements.in ones, but found that its resolver's handling of VCS-based dependencies (mainly iddata via idmodels) caused problems compared to a workflow based on pip-compile from the pip-tools package.

renv.lock

A renv.lock file is generated via the following steps. As noted above, the "install required R libraries via CRAN" step will vary depending on the individual model's needs. Below we show the commands for the flu_ar2 model, but you will need to change them for yours.

  1. start a fresh temporary rocker/r-ver:4.4.1 container via:
    docker run --rm -it --name temp_container rocker/r-ver:4.4.1 /bin/bash
  2. install the required OS libraries and applications (see "install general OS utilities" and "install OS binaries required by R packages", and "install system libraries required by pyenv" in the Dockerfile). note that you do not copy the "RUN" part of each line, just the apt-get commands and their args.
  3. specify the p3m repository snapshot to a particular date (this allows binary packages to be installed for faster builds) (see the rocker-project guidance for switching the default CRAN mirror):
    /rocker_scripts/setup_R.sh https://p3m.dev/cran/__linux__/jammy/2025-09-11
  4. install renv via:
    Rscript -e "install.packages('renv')"
  5. create a project directory and initialize renv via:
    mkdir /proj ; cd /proj
    Rscript -e "renv::init(bare = TRUE)"
  6. install required R libraries. NB: these will vary depending on the model (see each model's README.md for the actual list). For example:
    Rscript -e "renv::install(c('lubridate', 'readr', 'remotes'))"
    Rscript -e "renv::install('arrow')"
    Rscript -e "renv::install('reichlab/zoltr')"
    Rscript -e "renv::install('hubverse-org/hubData@*release')"
    Rscript -e "renv::install('hubverse-org/hubVis@*release')"
  7. create renv.lock from within the R interpreter (this fails in bash) via:
    renv::settings$snapshot.type('all') ; renv::snapshot()
  8. copy the new /proj/renv.lock file out from the container

About

Repository for automating runs of operational disease forecasting models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors