From 89084722df617c69336b71487f87e8bc409bedb8 Mon Sep 17 00:00:00 2001
From: David McClosky <david.mcclosky@quant-aq.com>
Date: Sun, 18 Feb 2024 12:02:40 -0500
Subject: [PATCH 1/3] Revamp docs, add Quick Start doc

We now have a Quick Start based on a more complete template (https://github.com/quant-aq/aeromancy-project-template/pull/3). It provides a guided tour of the basic components, but the current draft is still missing instructions on how to actually run experiments.

Other docs have been reorganized a bit and "customizing.md" has been pulled out of "scaffolding.md".
---
 README.md                |  48 ++-----
 docs/docs/customizing.md |  46 +++++++
 docs/docs/index.md       |  41 +++++-
 docs/docs/quick_start.md | 289 +++++++++++++++++++++++++++++++++++++++
 docs/docs/scaffolding.md |  58 +-------
 docs/docs/setup.md       |  41 ++++++
 docs/mkdocs.yml          |  10 +-
 7 files changed, 436 insertions(+), 97 deletions(-)
 create mode 100644 docs/docs/customizing.md
 create mode 100644 docs/docs/quick_start.md
 create mode 100644 docs/docs/setup.md

diff --git a/README.md b/README.md
index 1be50ee..fa25915 100644
--- a/README.md
+++ b/README.md
@@ -18,54 +18,32 @@ by providing both new infrastructure (a more comprehensive versioning scheme
 including both system runtimes and external datasets) and a corresponding set of
 best practices to ensure experiments are maximally trackable.
 
-In its current form, Aeromancy requires a fairly specific software stack:
+In its current form, Aeromancy requires a fairly specific software stack: (hey,
+we said it was opinionated)
 
 - **Experiment tracker**: [Weights and Biases](https://wandb.ai)
 - **Object storage** (artifacts): S3-compatible, e.g.,
   [Ceph](https://github.com/ceph/ceph)
 - **Virtualization**: [Docker](https://www.docker.com/)
+- **Python Package Manager**: [pdm](https://pdm.fming.dev)
+- **Revision Control**: [Git](https://git-scm.com/)
 
 **Note:** As is likely obvious, Aeromancy documentation is in a very early
 state. As this is a pre-release support may be limited. For now, we include a
 couple pointers for how to setup your environment for Aeromancy.
 
-## Getting started
+## Documentation overview
 
-**Coming soon**: A proper Getting Started section.
+- If you're new to Aeromancy, [start here](docs/docs/quick_start.md)!
+- In the Developer Reference section of the documentation, we include some
+  design docs which provide an [architectural overview](docs/docs/scaffolding.md) and a
+  [glossary](docs/docs/tasks.md) of terms.
+- To see autogenerated docs for code from this repo, you'll need to start a
+  local doc server (`pdm doc`).
 
-To quickly set up an Aeromancy project, we've created a
-[Copier](https://copier.readthedocs.io/en/stable/) template. See instructions at
-the
-[quant-aq/aeromancy-project-template](https://github.com/quant-aq/aeromancy-project-template?tab=readme-ov-file#quick-start).
-
-## Requirements
-
-- Python 3.10.5 or higher
-- [`pdm`](https://pdm.fming.dev): Install via `pip install --user pdm` then
-  install Aeromancy packages with `pdm install`.
-- **Environment variables**:
-  - S3 backend location and credentials:
-    - `AEROMANCY_AWS_ACCESS_KEY_ID`
-    - `AEROMANCY_AWS_SECRET_ACCESS_KEY`
-    - `AEROMANCY_AWS_S3_ENDPOINT_URL`
-    - `AEROMANCY_AWS_REGION`
-  - `WANDB_API_KEY` (from [Weights and Biases](https://wandb.ai))
-- **SSH Authentication**: You'll want `ssh-agent` setup if you need to access
-  private GitHub repositories. Check out these
-  [instructions](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent).
-
-### Mac OS
-
-- Use [Homebrew](https://brew.sh/) to install the following:
-  - `brew install apache-arrow@13.0.0_5 bat@0.23.0 graphviz@8.1.0
-    openblas@0.3.24 pre-commit@3.3.3`
-- Install Docker Desktop from [docker.com](https://www.docker.com/) (not Brew
-  since it has a trickier upgrade story)
-
-## Common commands
+## Common development commands
 
 - `pdm lint`: Run pre-commit linters
 - `pdm test`: Run test suite
 - `pdm doc`: Start doc server (see also the [public
-  version](https://quant-aq.github.io/aeromancy/) for the latest checked in
-  version)
+  version](https://quant-aq.github.io/aeromancy/) for the latest release)
diff --git a/docs/docs/customizing.md b/docs/docs/customizing.md
new file mode 100644
index 0000000..ff847e9
--- /dev/null
+++ b/docs/docs/customizing.md
@@ -0,0 +1,46 @@
+
+# Customizing Aeromancy projects
+
+To quickly set up an Aeromancy project, we've created a
+[Copier](https://copier.readthedocs.io/en/stable/) template. See instructions at
+the
+[quant-aq/aeromancy-project-template](https://github.com/quant-aq/aeromancy-project-template?tab=readme-ov-file#quick-start).
+
+In the generated Python project setup (`pyproject.toml`), you may also want to
+adjust:
+
+- **Extra Python packages:** Add them with `pdm add <pkgname>`. See [PDM
+  docs](https://pdm.fming.dev/latest/usage/dependency/) for more information on
+  this.
+- **`pdm` [scripts](https://pdm.fming.dev/latest/usage/scripts/)**: Some of
+  these are necessary for running Aeromancy (like `pdm go`), but you can add
+  more if there are common tasks for your project.
+- **Extra `docker run` arguments**: (E.g., mounting
+  [volumes](https://docs.docker.com/engine/reference/commandline/run/#mount)).
+  These can be baked `pdm go` script with `--extra-docker-run-args='...'`. The
+  [template](https://github.com/quant-aq/aeromancy-project-template) includes a
+  standard volume mapping (`data/`) for ingesting datasets.
+- **Extra Debian packages:** (outside of those included by Aeromancy), you may
+  want to bake them into the `pdm go` script with `--extra-debian-package='...'`
+  (specify the flag once per package name).
+
+## Filesystem layout
+
+Ultimately, the structure of an Aeromancy project should look something like
+this:
+
+```text
+<projectroot>/
+  pyproject.toml
+  pdm.lock
+  main.py  # AeroMain
+  src/
+    <projectname>/
+      <youractions>.py
+      <youractionbuilder>.py
+```
+
+The structure of the classes containing your
+[`Action`][aeromancy.action.Action](s) and
+[`ActionBuilder`][aeromancy.action_builder.ActionBuilder] is flexible -- they
+just need to be importable in AeroMain.
diff --git a/docs/docs/index.md b/docs/docs/index.md
index 612c7a5..fa1c06a 100644
--- a/docs/docs/index.md
+++ b/docs/docs/index.md
@@ -1 +1,40 @@
---8<-- "README.md"
+# Aeromancy
+
+[![Tests](https://github.com/quant-aq/aeromancy/actions/workflows/ci.yml/badge.svg)](https://github.com/quant-aq/aeromancy/actions/workflows/ci.yml)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm.fming.dev)
+[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
+[![pre-commit enabled](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/)
+![Apache 2.0 licensed](https://img.shields.io/github/license/quant-aq/aeromancy)
+
+**Aeromancy** is an opinionated philosophy and open-sourced framework that
+closely tracks experimental runtime environments for more reproducible machine
+learning. In existing experiment trackers, it’s easy to miss important details
+about how an experiment was run, e.g., which version of a dataset was used as
+input or the exact versions of library dependencies. Missing these details can
+make replicability more difficult. Aeromancy aims to make this process smoother
+by providing both new infrastructure (a more comprehensive versioning scheme
+including both system runtimes and external datasets) and a corresponding set of
+best practices to ensure experiments are maximally trackable.
+
+In its current form, Aeromancy requires a fairly specific software stack: (hey,
+we said it was opinionated)
+
+- **Experiment tracker**: [Weights and Biases](https://wandb.ai)
+- **Object storage** (artifacts): S3-compatible, e.g.,
+  [Ceph](https://github.com/ceph/ceph)
+- **Virtualization**: [Docker](https://www.docker.com/)
+- **Python Package Manager**: [pdm](https://pdm.fming.dev)
+- **Revision Control**: [Git](https://git-scm.com/)
+
+**Note:** As is likely obvious, Aeromancy documentation is in a very early
+state. As this is a pre-release support may be limited.
+
+## Documentation overview
+
+- If you're new to Aeromancy, [start here](quick_start.md)!
+- In the Developer Reference section of the documentation, we include some
+  design docs which provide an [architectural overview](scaffolding.md) and a
+  [glossary](tasks.md) of terms.
+- Lastly, we have autogenerated documentation in [Code
+  Reference](reference/aeromancy/index.md).
diff --git a/docs/docs/quick_start.md b/docs/docs/quick_start.md
new file mode 100644
index 0000000..881f3ee
--- /dev/null
+++ b/docs/docs/quick_start.md
@@ -0,0 +1,289 @@
+# Quick start
+
+This guide will walk you through some of the basic Aeromancy workflows. We'll be
+using Aeromancy in "development" mode which lets us focus on key Aeromancy
+concepts.
+
+## Creating a project
+
+To quickly set up an Aeromancy project, we've created a
+[Copier](https://copier.readthedocs.io/en/stable/) template at
+[quant-aq/aeromancy-project-template](https://github.com/quant-aq/aeromancy-project-template?tab=readme-ov-file#quick-start).
+Let's start by creating a new project called `aerodemo`:
+
+1. Install [PDM](https://pdm.fming.dev) with
+   [Copier](https://copier.readthedocs.io/en/stable/) support:
+
+    ```bash
+    pip install --user "pdm[copier]"
+    ```
+
+2. Set up a new Aeromancy-managed project with the template. This will create
+   the project directory `aerodemo` for you:
+
+    ```bash
+    copier copy --trust "gh:quant-aq/aeromancy-project-template" aerodemo
+    ```
+
+    The template will various questions. For the purpose of this Quick Start,
+    it's fine to fill in `aerodemo` or defaults for all fields.
+
+3. Install project dependencies:
+
+    ```bash
+    cd aerodemo
+    git init
+    pdm install --dev --no-self
+    ```
+
+## What's in an Aeromancy project?
+
+Aeromancy projects contain several different components. For now, we'll start
+with the three most important: (see [Tasks, Trackers, and Actions](tasks.md) for
+more details on these and the other main classes)
+
+### Actions
+
+[`Action`][aeromancy.action.Action]s define a specific data transformation you'd
+like to track with Aeromancy (e.g., training a model or performing a step in a
+data processing pipeline). If you're familiar with
+[Luigi](https://luigi.readthedocs.io/en/stable/) and other pipeline builders,
+this may be familiar. [`Action`][aeromancy.action.Action]s roughly correspond to
+a run on [Weights and Biases](https://docs.wandb.ai/quickstart) (Aeromancy will
+help you create the runs on the Weights and Biases side).
+
+In `src/aerodemo/actions.py`, we include three example
+[`Action`][aeromancy.action.Action]s: `ExampleStoreAction`,`ExampleTrainAction`,
+and `ExampleEvaluationAction`. Let's walk through these.
+
+#### Creating `Artifact`s with `ExampleStoreAction`
+
+```python
+class ExampleStoreAction(Action):
+    """Example Aeromancy `Action` to store an existing dataset."""
+```
+
+[`Action`][aeromancy.action.Action]s have class attributes help you organize
+your Actions and will be exposed later in experiment trackers like [Weights and
+Biases](https://docs.wandb.ai/quickstart). From most general to most specific, here are the three organizational levels Weights and Biases (and thus Aeromancy) provides:
+
+- `project_name` (defined by [`ActionBuilder`][aeromancy.action_builder.ActionBuilder])
+    - `job_group`
+        - `job_type`
+            - individual [`Action`][aeromancy.action.Action]s
+
+ Our example represents a typical ML flow with three
+[`Action`][aeromancy.action.Action]s:
+
+1. `job_group=model, job_type=store-dataset`: Store the dataset as a tracked
+   artifact in Aeromancy (more on artifacts soon!)
+2. `job_group=model, job_type=train-model`: Train a model from the dataset
+3. `job_group=model, job_type=eval-emodel`: Evaluate a model on the dataset
+
+```python
+    job_type = "store-dataset"
+    job_group = "model"
+```
+
+`outputs()` tells Aeromancy what artificts this Action produces. Most
+[`Action`][aeromancy.action.Action]s only create a single thing (e.g., a
+training action creates a model, an evaluation action could output its
+predictions over the dataset) but multiple outputs are allowed. Also note that
+these can be dynamically generated based on the configuration of the
+[`Action`][aeromancy.action.Action].
+
+```python
+    @override
+    def outputs(self) -> list[str]:
+        return ["example-dataset"]
+```
+
+`run()` defines the actual logic that should be tracked (train a model,
+transform a dataset, etc.). Within `run()`, we're responsible for declaring
+input and output artifacts with the provided
+[`Tracker`][aeromancy.tracker.Tracker]. Much of the work in this example centers
+around configuring an output artifact with `tracker.declare_output()`. Why is
+this so complicated? Declaring an output artifact has several effects which
+Aeromancy will bind together:
+
+1. It creates a tracked (versioned) artifact from a set of local files.
+2. This makes the artifact usable in downstream
+    [`Action`][aeromancy.action.Action] -- we'll access the files through
+    Aeromancy rather than directly from disk, in fact, since it will ensure that
+    we're using the correct version of it.
+3. It will store the artifact to an S3-compatible blob store, creating a
+    permanent and versioned reference to the contents (well, as permanent
+    as the blob store).
+4. It will create a corresponding Weights and Biases artifact which will
+    be associated with the corresponding Weights and Biases run and the
+    Aeromancy Artifact.
+
+```python
+    @override
+    def run(self, tracker: Tracker) -> None:
+        print("Hello world from ExampleStoreAction.")
+```
+
+Our dataset already exists on disk in a special directory (`data/`) which is
+accessible both inside and outside the Docker container. This should generally
+only be used for initial dataset ingestion -- downstream
+[`Action`][aeromancy.action.Action]s should not use this path.
+
+```python
+        dataset_paths = [
+            Path("data/example_train_data.txt"),
+            Path("data/example_test_data.txt"),
+        ]
+```
+
+We can associate arbitrary metrics with the dataset:
+
+```python
+        dataset_metadata = {
+            "num_train_records": dataset_paths[0].read_text().splitlines(),
+            "num_test_records": dataset_paths[1].read_text().splitlines(),
+        }
+```
+
+We'll use `outputs()` from above to keep artifact names in sync.
+
+```python
+        [dataset_artifact_name] = self.outputs()
+```
+
+Now we're ready to declare `dataset_artifact_name` as an output dependency with
+`tracker.declare_output`. We'll go over each argument:
+
+- `name`: This is the name of the artifact we're declaring. This name is used in
+  many places:
+
+    1. It needs to match one of the names in list of artifact names returned by
+        `outputs()`, so it will be part of the name of any jobs that run this
+        Action.
+    2. Downstream [`Action`][aeromancy.action.Action]s will be able to refer to
+       this artifact by this name.
+    3. This is also the name of the corresponding Weights and Biases artifact.
+- `local_filenames`: A list of files that should be included in the artifact.
+- `s3_destination`: Where to store the artifact in the blob store -- this
+  includes the bucket and key (a path prefix). This is purely for organization
+  purposes -- naming destinations clearly could also aid with debugging but in
+  general, you won't need to know or use S3 paths.
+- `artifact_type`: This is purely for organization purposes and will be exposed
+  in Weights and Biases. We recommend a human-readable version of the file type.
+- `metadata`: This is an optional property for any extra metadata that you'd
+  like to associate with the artifact (it will also be exposed in Weights and
+  Biases). It can also include nested data and store a wide range of types.
+- `strip_prefix`: This is the portion of the `local_filenames` paths that we
+  don't want to use include in our artifact names on the blob store. In this
+  case, this means we'll store `data/example_train_data.txt` as
+  `dataset/bogus-example_train_data.txt` in the `example-bucket` bucket (the
+  `dataset/` comes from our `s3_destination` key).
+
+```python
+        tracker.declare_output(
+            name=dataset_artifact_name,
+            local_filenames=dataset_paths,
+            s3_destination=S3Object("example-bucket", "dataset/"),
+            artifact_type="dataset",
+            metadata=dataset_metadata,
+            strip_prefix="data/",
+        )
+```
+
+We've created our first [`Action`][aeromancy.action.Action]. Next, let's look at
+`ExampleTrainAction` which will use the dataset stored by `ExampleStoreAction`.
+
+#### Using configuration options and `Artifact`s with `ExampleTrainAction`
+
+We'll focus on the novel parts of `ExampleTrainAction` (see the generated code
+for some additional commentary). First, we'll introduce a configuration
+parameter. Parameters can be anything that changes behavior or helps you
+organize your experiments -- these include hyperparameters, toggling features,
+or your own metadata. Let's look at `__init__` where `learning_rate` is our
+example configuration parameter. Also note that we take a reference to a
+`ExampleStoreAction`. This will indicate a dependency and help Aeromancy know
+that it needs to run first. You might also be wondering about where
+`store_dataset` and `learning_rate` are set -- this will happen later in our
+[`ActionBuilder`][aeromancy.action_builder.ActionBuilder].
+
+```python
+    def __init__(
+        self,
+        store_dataset: ExampleStoreAction,
+        learning_rate: float,
+    ):
+        self.learning_rate = learning_rate
+```
+
+We need to call our superconstructor which include `store_dataset` as a parent
+Action as well as our configuration parameter:
+
+```python
+        Action.__init__(self, parents=[store_dataset], learning_rate=learning_rate)
+```
+
+In our `run()` method, now we'll be able to use the artifact from our parent:
+
+```python
+    @override
+    def run(self, tracker: Tracker) -> None:
+        print("Hello world from ExampleTrainAction.")
+```
+
+This demonstrates `get_io()`, a helper method to simultaneously provide input
+and output artifact names. Most [`Action`][aeromancy.action.Action]s include a
+call to this. Note that inputs and outputs are each lists which is why we're
+using brackets to unpack these. Also note that the order of the input artifact
+names will follow the order of parent [`Action`][aeromancy.action.Action]s (see
+`ExampleEvaluationAction` for an example of an
+[`Action`][aeromancy.action.Action] with multiple parents and thus multiple
+input artifacts).
+
+```python
+        [dataset_artifact_name], [model_artifact_name] = self.get_io()
+```
+
+Once we know the name of our input artifact, we need to declare it as a
+dependency. This is the counterpart of `tracker.declare_output()` from
+`ExampleStoreAction`. It will resolve the artifact to the appropriate version
+and return the paths we should use to read the dataset.
+
+```python
+        dataset_paths = tracker.declare_input(dataset_artifact_name)
+
+        train_data = dataset_paths[0].read_text()
+        print(f"Training data: {train_data!r}")
+```
+
+### ActionBuilder
+
+An [`ActionBuilder`][aeromancy.action_builder.ActionBuilder]
+(`src/aerodemo/action_builder.py`) is responsible for
+constructing a dependency graph of [`Action`][aeromancy.action.Action]s.
+
+TODO: code walkthrough
+
+### AeroMain
+
+`src/main.py`, typically referred to as AeroMain is the main entry point to an
+Aeromancy project, responsible for determining configuration options,
+constructing an [`ActionBuilder`][aeromancy.action_builder.ActionBuilder], and
+launching it.
+
+TODO: code walkthrough
+
+## Running our first experiments
+
+TODO: `pdm go` etc.
+
+## What's next?
+
+We've gone through all the main components you'll need to define to run
+experiments in Aeromancy. Next up, you might want to:
+
+- [Configure](setup.md) Aeromancy to work with Weights and Biases and
+  S3-compatible blob stores
+- TODO: Developing and Debugging in Aeromancy (`bailout`, `--debug`, common
+  pitfalls, `aeroset`, `aeroview`, `rerun` commands)
+- [Customizing](customizing.md) your Aeromancy project
+- TODO: best practices
diff --git a/docs/docs/scaffolding.md b/docs/docs/scaffolding.md
index b55ebfc..540d0b1 100644
--- a/docs/docs/scaffolding.md
+++ b/docs/docs/scaffolding.md
@@ -6,7 +6,7 @@ doing the cross-references. -->
 In order to enable tracking, Aeromancy is rather opinionated about how projects
 are set up. A "project" in this case means a pipeline of tasks, potentially
 configurable through CLI flags. This document provides an overview of the
-components involved and how to set up a new Aeromancy project.
+components involved.
 
 This diagram roughly shows the flow:
 
@@ -81,59 +81,3 @@ generated [`Action`][aeromancy.action.Action]s.
 
 See [Tasks, Trackers, and Actions](tasks.md) for more information on these
 objects.
-
-## Creating a new Aeromancy project
-
-In order to set up a new project, you'll need a Git repository with these
-components:
-
-- Actions (subclasses of [`Action`][aeromancy.action.Action] with specific logic
-  for your tasks)
-- An [`ActionBuilder`][aeromancy.action_builder.ActionBuilder] to instantiate
-  the [`Action`][aeromancy.action.Action] objects and describe their
-  dependencies
-- An "AeroMain" script to parse any project-specific options and bring it all
-  together
-
-To quickly set up an Aeromancy project, we've created a
-[Copier](https://copier.readthedocs.io/en/stable/) template. See instructions at
-the
-[quant-aq/aeromancy-project-template](https://github.com/quant-aq/aeromancy-project-template?tab=readme-ov-file#quick-start).
-In the generated Python project setup (`pyproject.toml`), you may also want to
-adjust:
-
-- **Extra Python packages:** Add them with `pdm add <pkgname>`. See [PDM
-  docs](https://pdm.fming.dev/latest/usage/dependency/) for more information on
-  this.
-- **`pdm` [scripts](https://pdm.fming.dev/latest/usage/scripts/)**: Some of
-  these are necessary for running Aeromancy (like `pdm go`), but you can add
-  more if there are common tasks for your project.
-- **Extra `docker run` arguments**: E.g., mounting
-  [volumes](https://docs.docker.com/engine/reference/commandline/run/#mount)).
-  These can be baked `pdm go` script with `--extra-docker-run-args='...'`.
-- **Extra Debian packages:** (outside of those included by Aeromancy), you may
-  want to bake them into the `pdm go` script with `--extra-debian-package='...'`
-  (specify the flag once per package name).
-- **Development environment (linters, etc.):** Aeromancy encourages the use of
-  the `ruff` linter and `Black` formatter, but these are customizable.
-
-### Filesystem layout
-
-Ultimately, the structure of an Aeromancy project should look something like
-this:
-
-```text
-<projectroot>/
-  pyproject.toml
-  pdm.lock
-  main.py  # AeroMain
-  src/
-    <projectname>/
-      <youractions>.py
-      <youractionbuilder>.py
-```
-
-The structure of the classes containing your
-[`Action`][aeromancy.action.Action](s) and
-[`ActionBuilder`][aeromancy.action_builder.ActionBuilder] is flexible -- they
-just need to be importable in AeroMain.
diff --git a/docs/docs/setup.md b/docs/docs/setup.md
new file mode 100644
index 0000000..fd3ef95
--- /dev/null
+++ b/docs/docs/setup.md
@@ -0,0 +1,41 @@
+# Installing and setting up Aeromancy
+
+The easiest way to setup Aeromancy is to follow the [Quick
+Start](quick_start.md) guide. This document includes additional setup
+instructions for running Aeromany in "production" mode.
+
+- **Python**: Aeromancy works with Python 3.10.5 or higher
+- **Python package manager**: Aeromancy currently requires [`pdm`](https://pdm.fming.dev).
+
+    - Install via `pip install --user pdm`
+
+- **Environment variables**:
+
+    - To use an S3-compatible backend (e.g.,
+      [Ceph](https://github.com/ceph/ceph)), you'll need to set these
+      environmental variables:
+
+        - `AEROMANCY_AWS_ACCESS_KEY_ID`
+        - `AEROMANCY_AWS_SECRET_ACCESS_KEY`
+        - `AEROMANCY_AWS_S3_ENDPOINT_URL`
+        - `AEROMANCY_AWS_REGION` (can be left empty if it doesn't apply)
+
+    - You'll also need to set `WANDB_API_KEY` (from [Weights and Biases](https://wandb.ai))
+
+- **SSH Authentication**: You'll want `ssh-agent` setup if you need to access
+  private GitHub repositories. Check out these
+  [instructions](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent).
+
+## Linux
+
+You'll want to install some packages. On Debian, you can use:
+
+- `apt install bat graphviz libopenblas-dev pre-commit docker.io`
+
+## Mac OS
+
+- We recommend using [Homebrew](https://brew.sh/) to install the following:
+    - `brew install apache-arrow@13.0.0_5 bat@0.23.0 graphviz@8.1.0
+       openblas@0.3.24 pre-commit@3.3.3`
+- Install Docker Desktop from [docker.com](https://www.docker.com/) (not Brew
+  since it has a trickier upgrade story)
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
index cab935a..599d6d5 100644
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -7,8 +7,12 @@ site_dir: "site"
 
 nav:
   - Home:
-    - Overview: index.md
-    - Scaffolding and new projects: scaffolding.md
+    - Introduction: index.md
+    - Quick Start: quick_start.md
+    - Seting up Aeromancy: setup.md
+    - Customizing your project: customizing.md
+  - Developer Reference:
+    - Scaffolding: scaffolding.md
     - Tasks, Trackers, and Actions: tasks.md
   - Code Reference: reference/
 
@@ -28,7 +32,6 @@ markdown_extensions:
   - pymdownx.snippets:
       base_path:
       - docs
-      - ../README.md
       check_paths: true
   - pymdownx.superfences:
       custom_fences:
@@ -59,4 +62,3 @@ plugins:
 
 watch:
   - "../src"
-  - "../README.md"

From 975271fc07f951d979951c9e01844bf04a67ce30 Mon Sep 17 00:00:00 2001
From: David McClosky <david.mcclosky@quant-aq.com>
Date: Fri, 23 Feb 2024 11:16:12 -0500
Subject: [PATCH 2/3] Match renames in the template, lots of new text

(see https://github.com/quant-aq/aeromancy-project-template/pull/3/commits/eab1af7ebb1125a79a531dfbda558afea3b640af)
---
 docs/docs/index.md       |   5 +-
 docs/docs/quick_start.md | 291 ++++++++++++++++++++++++++++++++-------
 docs/mkdocs.yml          |   2 +
 3 files changed, 247 insertions(+), 51 deletions(-)

diff --git a/docs/docs/index.md b/docs/docs/index.md
index fa1c06a..7078628 100644
--- a/docs/docs/index.md
+++ b/docs/docs/index.md
@@ -27,8 +27,9 @@ we said it was opinionated)
 - **Python Package Manager**: [pdm](https://pdm.fming.dev)
 - **Revision Control**: [Git](https://git-scm.com/)
 
-**Note:** As is likely obvious, Aeromancy documentation is in a very early
-state. As this is a pre-release support may be limited.
+!!! note
+    Aeromancy documentation is still in a very early state. As this is a
+    pre-release, support may be limited.
 
 ## Documentation overview
 
diff --git a/docs/docs/quick_start.md b/docs/docs/quick_start.md
index 881f3ee..8756fa0 100644
--- a/docs/docs/quick_start.md
+++ b/docs/docs/quick_start.md
@@ -1,8 +1,6 @@
 # Quick start
 
-This guide will walk you through some of the basic Aeromancy workflows. We'll be
-using Aeromancy in "development" mode which lets us focus on key Aeromancy
-concepts.
+This guide will walk you through some of the basic Aeromancy workflows.
 
 ## Creating a project
 
@@ -25,8 +23,8 @@ Let's start by creating a new project called `aerodemo`:
     copier copy --trust "gh:quant-aq/aeromancy-project-template" aerodemo
     ```
 
-    The template will various questions. For the purpose of this Quick Start,
-    it's fine to fill in `aerodemo` or defaults for all fields.
+    The template will ask a lot of questions. For the purpose of this Quick
+    Start, it's fine to fill in `aerodemo` or defaults for all fields.
 
 3. Install project dependencies:
 
@@ -53,14 +51,18 @@ a run on [Weights and Biases](https://docs.wandb.ai/quickstart) (Aeromancy will
 help you create the runs on the Weights and Biases side).
 
 In `src/aerodemo/actions.py`, we include three example
-[`Action`][aeromancy.action.Action]s: `ExampleStoreAction`,`ExampleTrainAction`,
+[`Action`][aeromancy.action.Action]s: `ExampleIngestAction`,`ExampleTrainAction`,
 and `ExampleEvaluationAction`. Let's walk through these.
 
-#### Creating `Artifact`s with `ExampleStoreAction`
+!!! note
+    We'll likely be simplifying the [`Action`][aeromancy.action.Action] API in
+    the near future. We hope to streamline it significantly.
+
+#### Creating `Artifact`s with `ExampleIngestAction`
 
 ```python
-class ExampleStoreAction(Action):
-    """Example Aeromancy `Action` to store an existing dataset."""
+class ExampleIngestAction(Action):
+    """Example Aeromancy `Action` to ingest an existing dataset."""
 ```
 
 [`Action`][aeromancy.action.Action]s have class attributes help you organize
@@ -75,13 +77,13 @@ Biases](https://docs.wandb.ai/quickstart). From most general to most specific, h
  Our example represents a typical ML flow with three
 [`Action`][aeromancy.action.Action]s:
 
-1. `job_group=model, job_type=store-dataset`: Store the dataset as a tracked
+1. `job_group=model, job_type=ingest-dataset`: Store the dataset as a tracked
    artifact in Aeromancy (more on artifacts soon!)
 2. `job_group=model, job_type=train-model`: Train a model from the dataset
 3. `job_group=model, job_type=eval-emodel`: Evaluate a model on the dataset
 
 ```python
-    job_type = "store-dataset"
+    job_type = "ingest-dataset"
     job_group = "model"
 ```
 
@@ -102,26 +104,29 @@ these can be dynamically generated based on the configuration of the
 transform a dataset, etc.). Within `run()`, we're responsible for declaring
 input and output artifacts with the provided
 [`Tracker`][aeromancy.tracker.Tracker]. Much of the work in this example centers
-around configuring an output artifact with `tracker.declare_output()`. Why is
-this so complicated? Declaring an output artifact has several effects which
-Aeromancy will bind together:
-
-1. It creates a tracked (versioned) artifact from a set of local files.
-2. This makes the artifact usable in downstream
-    [`Action`][aeromancy.action.Action] -- we'll access the files through
-    Aeromancy rather than directly from disk, in fact, since it will ensure that
-    we're using the correct version of it.
-3. It will store the artifact to an S3-compatible blob store, creating a
-    permanent and versioned reference to the contents (well, as permanent
-    as the blob store).
-4. It will create a corresponding Weights and Biases artifact which will
-    be associated with the corresponding Weights and Biases run and the
-    Aeromancy Artifact.
+around configuring an output artifact with
+[`tracker.declare_output`][aeromancy.Tracker.declare_output].
+
+!!! question
+    Why is this so complicated? Declaring an output artifact has several effects
+    which Aeromancy will bind together:
+
+    1. It creates a tracked (versioned) artifact from a set of local files.
+    2. This makes the artifact usable in downstream
+        [`Action`][aeromancy.action.Action] -- we'll access the files through
+        Aeromancy rather than directly from disk, in fact, since it will ensure that
+        we're using the correct version of it.
+    3. It will store the artifact to an S3-compatible blob store, creating a
+        permanent and versioned reference to the contents (well, as permanent
+        as the blob store).
+    4. It will create a corresponding Weights and Biases artifact which will
+        be associated with the corresponding Weights and Biases run and the
+        Aeromancy Artifact.
 
 ```python
     @override
     def run(self, tracker: Tracker) -> None:
-        print("Hello world from ExampleStoreAction.")
+        print("Hello world from ExampleIngestAction.")
 ```
 
 Our dataset already exists on disk in a special directory (`data/`) which is
@@ -152,7 +157,8 @@ We'll use `outputs()` from above to keep artifact names in sync.
 ```
 
 Now we're ready to declare `dataset_artifact_name` as an output dependency with
-`tracker.declare_output`. We'll go over each argument:
+[`tracker.declare_output`][aeromancy.Tracker.declare_output]. We'll go over each
+argument:
 
 - `name`: This is the name of the artifact we're declaring. This name is used in
   many places:
@@ -191,7 +197,7 @@ Now we're ready to declare `dataset_artifact_name` as an output dependency with
 ```
 
 We've created our first [`Action`][aeromancy.action.Action]. Next, let's look at
-`ExampleTrainAction` which will use the dataset stored by `ExampleStoreAction`.
+`ExampleTrainAction` which will use the dataset stored by `ExampleIngestAction`.
 
 #### Using configuration options and `Artifact`s with `ExampleTrainAction`
 
@@ -201,25 +207,25 @@ parameter. Parameters can be anything that changes behavior or helps you
 organize your experiments -- these include hyperparameters, toggling features,
 or your own metadata. Let's look at `__init__` where `learning_rate` is our
 example configuration parameter. Also note that we take a reference to a
-`ExampleStoreAction`. This will indicate a dependency and help Aeromancy know
+`ExampleIngestAction`. This will indicate a dependency and help Aeromancy know
 that it needs to run first. You might also be wondering about where
-`store_dataset` and `learning_rate` are set -- this will happen later in our
+`ingest_dataset` and `learning_rate` are set -- this will happen later in our
 [`ActionBuilder`][aeromancy.action_builder.ActionBuilder].
 
 ```python
     def __init__(
         self,
-        store_dataset: ExampleStoreAction,
+        ingest_dataset: ExampleIngestAction,
         learning_rate: float,
     ):
         self.learning_rate = learning_rate
 ```
 
-We need to call our superconstructor which include `store_dataset` as a parent
+We need to call our superconstructor which include `ingest_dataset` as a parent
 Action as well as our configuration parameter:
 
 ```python
-        Action.__init__(self, parents=[store_dataset], learning_rate=learning_rate)
+        Action.__init__(self, parents=[ingest_dataset], learning_rate=learning_rate)
 ```
 
 In our `run()` method, now we'll be able to use the artifact from our parent:
@@ -244,8 +250,9 @@ input artifacts).
 ```
 
 Once we know the name of our input artifact, we need to declare it as a
-dependency. This is the counterpart of `tracker.declare_output()` from
-`ExampleStoreAction`. It will resolve the artifact to the appropriate version
+dependency. This is the counterpart of
+[`tracker.declare_output`][aeromancy.Tracker.declare_output] from
+`ExampleIngestAction`. It will resolve the artifact to the appropriate version
 and return the paths we should use to read the dataset.
 
 ```python
@@ -255,35 +262,221 @@ and return the paths we should use to read the dataset.
         print(f"Training data: {train_data!r}")
 ```
 
+#### Logging metrics
+
+As we've already seen, we can associate arbitrary metadata/metrics with
+artifacts as part of
+[`tracker.declare_output`][aeromancy.Tracker.declare_output]. We can also log
+metrics about the status of an `Action` with
+[`tracker.log`][aeromancy.Tracker.log]. Returning to the `run()` method in
+`ExampleTrainAction`:
+
+```python
+        # Now we pretend to train a model.
+        num_iterations = 10
+        # Seeding your RNG is always a good idea for better reproducibility.
+        rng = random.Random(x=7)
+        for step in range(num_iterations):
+            # We can store information about the experiment while it's being
+            # run.
+            tracker.log(
+                {
+                    "step": step,
+                    "train_error": rng.random(),
+                },
+            )
+```
+
 ### ActionBuilder
 
 An [`ActionBuilder`][aeromancy.action_builder.ActionBuilder]
-(`src/aerodemo/action_builder.py`) is responsible for
-constructing a dependency graph of [`Action`][aeromancy.action.Action]s.
+(`src/aerodemo/action_builder.py`) is responsible for constructing a dependency
+graph of [`Action`][aeromancy.action.Action]s. It will be able to receive
+options from the command-line in `__init__`:
 
-TODO: code walkthrough
+```python
+    def __init__(
+        self,
+        learning_rate: float,
+    ):
+        """Create an `ActionBuilder` for aerodemo."""
+        # The project name is for organizational purposes and will be the
+        # project name in Weights and Biases.
+        ActionBuilder.__init__(self, project_name="aerodemo")
+
+        self.learning_rate = learning_rate
+```
+
+The main logic here happens in
+[`build_actions`][aeromancy.ActionBuilder.build_actions], which constructs the
+[`Action`][aeromancy.action.Action] objects we defined above. When we construct
+an [`Action`][aeromancy.action.Action], we need to add it to a list using
+[`self.add_action`][aeromancy.ActionBuilder.add_action]:
+
+!!! note
+    This API is likely to be simplified in the near future.
+
+```python
+    @override
+    def build_actions(self) -> list[Action]:
+        actions = []
+
+        # Build each Action in sequence. Note that we use the helper method
+        # add_action rather than appending to the list directly, since
+        # add_action needs to do some work behind the scenes.
+        ingest_action = self.add_action(actions, ExampleIngestAction(parents=[]))
+        train_action = self.add_action(
+            actions,
+            ExampleTrainAction(
+                ingest_dataset=ingest_action,
+                learning_rate=self.learning_rate,
+            ),
+        )
+        self.add_action(
+            actions,
+            ExampleEvaluationAction(
+                ingest_dataset=ingest_action,
+                train_model=train_action,
+            ),
+        )
+        return actions
+```
 
 ### AeroMain
 
-`src/main.py`, typically referred to as AeroMain is the main entry point to an
-Aeromancy project, responsible for determining configuration options,
-constructing an [`ActionBuilder`][aeromancy.action_builder.ActionBuilder], and
-launching it.
+`src/main.py`, typically referred to as **AeroMain**, is the command-line entry
+point to an Aeromancy project, responsible for determining configuration
+options, constructing an
+[`ActionBuilder`][aeromancy.action_builder.ActionBuilder], and launching it. By
+default, Aeromancy will always look for AeroMain in `src/main.py`.
 
-TODO: code walkthrough
+It uses [Click](https://click.palletsprojects.com/) for option parsing and
+Aeromancy provides a bundle of its own options in
+[`@aeromancy_click_options`][aeromancy.click_options.aeromancy_click_options].
+Using [`rich.console`](https://rich.readthedocs.io/en/stable/console.html) for
+console logging is optional.
+
+```python
+@click.command()
+@click.option(
+    "-l",
+    "--learning-rate",
+    metavar="FLOAT",
+    default=1e-3,
+    type=float,
+    help="Learning rate in optimizer.",
+)
+# We also need to include a list of standard Aeromancy options.
+@aeromancy_click_options
+# Make sure to include any new options we created as arguments to aeromain.
+def aeromain(
+    learning_rate: float,
+    **aeromancy_options,
+):
+    """CLI application for controlling aerodemo."""
+```
+
+Within the `aeromain()` function, we construct an
+[`ActionBuilder`][aeromancy.action_builder.ActionBuilder] (you can use more than
+one if you have several similar pipelines in the same experiment), then convert
+it to to an [`ActionRunner`][aeromancy.action_runner.ActionRunner] and run the
+actions:
+
+```python
+    config = {"learning_rate": learning_rate}
+    console.log("Config parameters from CLI:", config)
+
+    # This builds our Action dependency graph given the configuration passed in.
+    action_builder = ExampleActionBuilder(**config)
+    # We create a corresponding runner to execute the dependency graph and kick
+    # it off.
+    action_runner = action_builder.to_runner()
+    action_runner.run_actions(**aeromancy_options)
+```
 
 ## Running our first experiments
 
-TODO: `pdm go` etc.
+Aeromancy projects all include standard scripts for running Aeromancy. The main
+script is called `go` which runs AeroMain. For the Quick Start, we'll use
+development mode with the `--dev` flag.
+
+!!! info
+    **Development mode** makes it easy to test and develope pipelines quickly.
+    It lets you run uncommitted code outside of a Docker container and Weights
+    and Biases to speed up the developer loop. It will attempt to read artifacts
+    from S3 so doesn't work completely offline (unless you already have the
+    artifacts cached from previous development mode runs). It's behavior is very
+    close to "production" mode with the main exception that it is not
+    necessarily using the same artifact versions.
+
+### Listing available [`Action`][aeromancy.action.Action]s
+
+Let's start by listing all the
+[`Action`][aeromancy.action.Action] with `--list`:
+
+```bash
+pdm go --dev --list
+```
+
+You should see something like this:
+
+```bash
+[12:00:00] Running 'pdm run python src/main.py --list'
+[12:00:01] Config parameters from CLI:
+           {'learning_rate': 0.001}
+[ingest-dataset] example-dataset
+[train-model] example-model
+[eval-model] example-model-predictions
+```
+
+We can see the results of our `console.log` statement with the default value for
+ the learning rate parameter. This is followed by a list of all
+[`Action`][aeromancy.action.Action]s our
+[`ActionBuilder`][aeromancy.action_builder.ActionBuilder] built. The `job_type`
+is shown in brackets, followed by a list of output artifacts.
+
+### Running the pipeline
+
+Assuming we're happy with the [`Action`][aeromancy.action.Action]s, we can run
+them all by omitting `--list`:
+
+```bash
+pdm go --dev
+```
+
+You should see it run each [`Action`][aeromancy.action.Action] in sequence.
+Don't worry if it's overwhelming at first. Because we're running in development
+mode, we're using a [fake tracker][aeromancy.fake_tracker.FakeTracker] instead
+of the production Weights and Biases tracker, so you'll see a lot of messages
+from it about what would happen if we were running in production mode.
+
+### Job selection
+
+Sometimes (in our experience, often) we don't want to run the entire pipeline.
+To run just some of the jobs, pass the `--only` flag. Aeromancy will then only
+run jobs with a name that includes that substring. You can pass it a
+comma-separated list. Note that names include the `job_type` as well.
+
+!!! example
+
+    - If you pass `--only train`, it will just run `ExampleTrainAction`
+
+    - If you pass `--only model`, it will run `ExampleTrainAction` then
+    `ExampleEvaluationAction` (since the latter depends on the former)
+
+    - If you pass `--only dataset,train`, it will run  `ExampleIngestAction` then
+    `ExampleTrainAction`
 
 ## What's next?
 
 We've gone through all the main components you'll need to define to run
-experiments in Aeromancy. Next up, you might want to:
+experiments in Aeromancy and how to run them in development mode. Next up, you
+might want to:
 
 - [Configure](setup.md) Aeromancy to work with Weights and Biases and
-  S3-compatible blob stores
-- TODO: Developing and Debugging in Aeromancy (`bailout`, `--debug`, common
+  S3-compatible blob stores (production mode)
+- (To be documented) Developing and Debugging (`bailout`, `--debug`, common
   pitfalls, `aeroset`, `aeroview`, `rerun` commands)
 - [Customizing](customizing.md) your Aeromancy project
-- TODO: best practices
+- (To be documented) Best practices and FAQ
+- (To be documented) Debugging Aeromancy itself (for Aeromancy developers)
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
index 599d6d5..bc54f9e 100644
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -24,6 +24,8 @@ theme:
   font:
     text: Open Sans
     code: Fira Code
+  features:
+    - content.code.copy
 
 markdown_extensions:
   - admonition

From 86bff9f1c86181838d238bfc66864df3940ae83e Mon Sep 17 00:00:00 2001
From: David McClosky <david.mcclosky@quant-aq.com>
Date: Fri, 23 Feb 2024 11:16:26 -0500
Subject: [PATCH 3/3] Allow AEROMANCY_AWS_REGION to be unset

---
 src/aeromancy/s3.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/aeromancy/s3.py b/src/aeromancy/s3.py
index c2dbc63..85094d2 100644
--- a/src/aeromancy/s3.py
+++ b/src/aeromancy/s3.py
@@ -361,7 +361,7 @@ def from_env_variables(cls):
             _S3_CLIENT = cls(
                 aws_access_key_id=os.environ["AEROMANCY_AWS_ACCESS_KEY_ID"],
                 aws_secret_access_key=os.environ["AEROMANCY_AWS_SECRET_ACCESS_KEY"],
-                region_name=os.environ["AEROMANCY_AWS_REGION"],
+                region_name=os.environ.get("AEROMANCY_AWS_REGION", ""),
                 endpoint_url=os.environ["AEROMANCY_AWS_S3_ENDPOINT_URL"],
             )
         return _S3_CLIENT