diff --git a/README.md b/README.md index 1be50ee..fa25915 100644 --- a/README.md +++ b/README.md @@ -18,54 +18,32 @@ by providing both new infrastructure (a more comprehensive versioning scheme including both system runtimes and external datasets) and a corresponding set of best practices to ensure experiments are maximally trackable. -In its current form, Aeromancy requires a fairly specific software stack: +In its current form, Aeromancy requires a fairly specific software stack: (hey, +we said it was opinionated) - **Experiment tracker**: [Weights and Biases](https://wandb.ai) - **Object storage** (artifacts): S3-compatible, e.g., [Ceph](https://github.com/ceph/ceph) - **Virtualization**: [Docker](https://www.docker.com/) +- **Python Package Manager**: [pdm](https://pdm.fming.dev) +- **Revision Control**: [Git](https://git-scm.com/) **Note:** As is likely obvious, Aeromancy documentation is in a very early state. As this is a pre-release support may be limited. For now, we include a couple pointers for how to setup your environment for Aeromancy. -## Getting started +## Documentation overview -**Coming soon**: A proper Getting Started section. +- If you're new to Aeromancy, [start here](docs/docs/quick_start.md)! +- In the Developer Reference section of the documentation, we include some + design docs which provide an [architectural overview](docs/docs/scaffolding.md) and a + [glossary](docs/docs/tasks.md) of terms. +- To see autogenerated docs for code from this repo, you'll need to start a + local doc server (`pdm doc`). -To quickly set up an Aeromancy project, we've created a -[Copier](https://copier.readthedocs.io/en/stable/) template. See instructions at -the -[quant-aq/aeromancy-project-template](https://github.com/quant-aq/aeromancy-project-template?tab=readme-ov-file#quick-start). - -## Requirements - -- Python 3.10.5 or higher -- [`pdm`](https://pdm.fming.dev): Install via `pip install --user pdm` then - install Aeromancy packages with `pdm install`. -- **Environment variables**: - - S3 backend location and credentials: - - `AEROMANCY_AWS_ACCESS_KEY_ID` - - `AEROMANCY_AWS_SECRET_ACCESS_KEY` - - `AEROMANCY_AWS_S3_ENDPOINT_URL` - - `AEROMANCY_AWS_REGION` - - `WANDB_API_KEY` (from [Weights and Biases](https://wandb.ai)) -- **SSH Authentication**: You'll want `ssh-agent` setup if you need to access - private GitHub repositories. Check out these - [instructions](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent). - -### Mac OS - -- Use [Homebrew](https://brew.sh/) to install the following: - - `brew install apache-arrow@13.0.0_5 bat@0.23.0 graphviz@8.1.0 - openblas@0.3.24 pre-commit@3.3.3` -- Install Docker Desktop from [docker.com](https://www.docker.com/) (not Brew - since it has a trickier upgrade story) - -## Common commands +## Common development commands - `pdm lint`: Run pre-commit linters - `pdm test`: Run test suite - `pdm doc`: Start doc server (see also the [public - version](https://quant-aq.github.io/aeromancy/) for the latest checked in - version) + version](https://quant-aq.github.io/aeromancy/) for the latest release) diff --git a/docs/docs/customizing.md b/docs/docs/customizing.md new file mode 100644 index 0000000..ff847e9 --- /dev/null +++ b/docs/docs/customizing.md @@ -0,0 +1,46 @@ + +# Customizing Aeromancy projects + +To quickly set up an Aeromancy project, we've created a +[Copier](https://copier.readthedocs.io/en/stable/) template. See instructions at +the +[quant-aq/aeromancy-project-template](https://github.com/quant-aq/aeromancy-project-template?tab=readme-ov-file#quick-start). + +In the generated Python project setup (`pyproject.toml`), you may also want to +adjust: + +- **Extra Python packages:** Add them with `pdm add `. See [PDM + docs](https://pdm.fming.dev/latest/usage/dependency/) for more information on + this. +- **`pdm` [scripts](https://pdm.fming.dev/latest/usage/scripts/)**: Some of + these are necessary for running Aeromancy (like `pdm go`), but you can add + more if there are common tasks for your project. +- **Extra `docker run` arguments**: (E.g., mounting + [volumes](https://docs.docker.com/engine/reference/commandline/run/#mount)). + These can be baked `pdm go` script with `--extra-docker-run-args='...'`. The + [template](https://github.com/quant-aq/aeromancy-project-template) includes a + standard volume mapping (`data/`) for ingesting datasets. +- **Extra Debian packages:** (outside of those included by Aeromancy), you may + want to bake them into the `pdm go` script with `--extra-debian-package='...'` + (specify the flag once per package name). + +## Filesystem layout + +Ultimately, the structure of an Aeromancy project should look something like +this: + +```text +/ + pyproject.toml + pdm.lock + main.py # AeroMain + src/ + / + .py + .py +``` + +The structure of the classes containing your +[`Action`][aeromancy.action.Action](s) and +[`ActionBuilder`][aeromancy.action_builder.ActionBuilder] is flexible -- they +just need to be importable in AeroMain. diff --git a/docs/docs/index.md b/docs/docs/index.md index 612c7a5..7078628 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -1 +1,41 @@ ---8<-- "README.md" +# Aeromancy + +[![Tests](https://github.com/quant-aq/aeromancy/actions/workflows/ci.yml/badge.svg)](https://github.com/quant-aq/aeromancy/actions/workflows/ci.yml) +[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) +[![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm.fming.dev) +[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) +[![pre-commit enabled](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/) +![Apache 2.0 licensed](https://img.shields.io/github/license/quant-aq/aeromancy) + +**Aeromancy** is an opinionated philosophy and open-sourced framework that +closely tracks experimental runtime environments for more reproducible machine +learning. In existing experiment trackers, it’s easy to miss important details +about how an experiment was run, e.g., which version of a dataset was used as +input or the exact versions of library dependencies. Missing these details can +make replicability more difficult. Aeromancy aims to make this process smoother +by providing both new infrastructure (a more comprehensive versioning scheme +including both system runtimes and external datasets) and a corresponding set of +best practices to ensure experiments are maximally trackable. + +In its current form, Aeromancy requires a fairly specific software stack: (hey, +we said it was opinionated) + +- **Experiment tracker**: [Weights and Biases](https://wandb.ai) +- **Object storage** (artifacts): S3-compatible, e.g., + [Ceph](https://github.com/ceph/ceph) +- **Virtualization**: [Docker](https://www.docker.com/) +- **Python Package Manager**: [pdm](https://pdm.fming.dev) +- **Revision Control**: [Git](https://git-scm.com/) + +!!! note + Aeromancy documentation is still in a very early state. As this is a + pre-release, support may be limited. + +## Documentation overview + +- If you're new to Aeromancy, [start here](quick_start.md)! +- In the Developer Reference section of the documentation, we include some + design docs which provide an [architectural overview](scaffolding.md) and a + [glossary](tasks.md) of terms. +- Lastly, we have autogenerated documentation in [Code + Reference](reference/aeromancy/index.md). diff --git a/docs/docs/quick_start.md b/docs/docs/quick_start.md new file mode 100644 index 0000000..8756fa0 --- /dev/null +++ b/docs/docs/quick_start.md @@ -0,0 +1,482 @@ +# Quick start + +This guide will walk you through some of the basic Aeromancy workflows. + +## Creating a project + +To quickly set up an Aeromancy project, we've created a +[Copier](https://copier.readthedocs.io/en/stable/) template at +[quant-aq/aeromancy-project-template](https://github.com/quant-aq/aeromancy-project-template?tab=readme-ov-file#quick-start). +Let's start by creating a new project called `aerodemo`: + +1. Install [PDM](https://pdm.fming.dev) with + [Copier](https://copier.readthedocs.io/en/stable/) support: + + ```bash + pip install --user "pdm[copier]" + ``` + +2. Set up a new Aeromancy-managed project with the template. This will create + the project directory `aerodemo` for you: + + ```bash + copier copy --trust "gh:quant-aq/aeromancy-project-template" aerodemo + ``` + + The template will ask a lot of questions. For the purpose of this Quick + Start, it's fine to fill in `aerodemo` or defaults for all fields. + +3. Install project dependencies: + + ```bash + cd aerodemo + git init + pdm install --dev --no-self + ``` + +## What's in an Aeromancy project? + +Aeromancy projects contain several different components. For now, we'll start +with the three most important: (see [Tasks, Trackers, and Actions](tasks.md) for +more details on these and the other main classes) + +### Actions + +[`Action`][aeromancy.action.Action]s define a specific data transformation you'd +like to track with Aeromancy (e.g., training a model or performing a step in a +data processing pipeline). If you're familiar with +[Luigi](https://luigi.readthedocs.io/en/stable/) and other pipeline builders, +this may be familiar. [`Action`][aeromancy.action.Action]s roughly correspond to +a run on [Weights and Biases](https://docs.wandb.ai/quickstart) (Aeromancy will +help you create the runs on the Weights and Biases side). + +In `src/aerodemo/actions.py`, we include three example +[`Action`][aeromancy.action.Action]s: `ExampleIngestAction`,`ExampleTrainAction`, +and `ExampleEvaluationAction`. Let's walk through these. + +!!! note + We'll likely be simplifying the [`Action`][aeromancy.action.Action] API in + the near future. We hope to streamline it significantly. + +#### Creating `Artifact`s with `ExampleIngestAction` + +```python +class ExampleIngestAction(Action): + """Example Aeromancy `Action` to ingest an existing dataset.""" +``` + +[`Action`][aeromancy.action.Action]s have class attributes help you organize +your Actions and will be exposed later in experiment trackers like [Weights and +Biases](https://docs.wandb.ai/quickstart). From most general to most specific, here are the three organizational levels Weights and Biases (and thus Aeromancy) provides: + +- `project_name` (defined by [`ActionBuilder`][aeromancy.action_builder.ActionBuilder]) + - `job_group` + - `job_type` + - individual [`Action`][aeromancy.action.Action]s + + Our example represents a typical ML flow with three +[`Action`][aeromancy.action.Action]s: + +1. `job_group=model, job_type=ingest-dataset`: Store the dataset as a tracked + artifact in Aeromancy (more on artifacts soon!) +2. `job_group=model, job_type=train-model`: Train a model from the dataset +3. `job_group=model, job_type=eval-emodel`: Evaluate a model on the dataset + +```python + job_type = "ingest-dataset" + job_group = "model" +``` + +`outputs()` tells Aeromancy what artificts this Action produces. Most +[`Action`][aeromancy.action.Action]s only create a single thing (e.g., a +training action creates a model, an evaluation action could output its +predictions over the dataset) but multiple outputs are allowed. Also note that +these can be dynamically generated based on the configuration of the +[`Action`][aeromancy.action.Action]. + +```python + @override + def outputs(self) -> list[str]: + return ["example-dataset"] +``` + +`run()` defines the actual logic that should be tracked (train a model, +transform a dataset, etc.). Within `run()`, we're responsible for declaring +input and output artifacts with the provided +[`Tracker`][aeromancy.tracker.Tracker]. Much of the work in this example centers +around configuring an output artifact with +[`tracker.declare_output`][aeromancy.Tracker.declare_output]. + +!!! question + Why is this so complicated? Declaring an output artifact has several effects + which Aeromancy will bind together: + + 1. It creates a tracked (versioned) artifact from a set of local files. + 2. This makes the artifact usable in downstream + [`Action`][aeromancy.action.Action] -- we'll access the files through + Aeromancy rather than directly from disk, in fact, since it will ensure that + we're using the correct version of it. + 3. It will store the artifact to an S3-compatible blob store, creating a + permanent and versioned reference to the contents (well, as permanent + as the blob store). + 4. It will create a corresponding Weights and Biases artifact which will + be associated with the corresponding Weights and Biases run and the + Aeromancy Artifact. + +```python + @override + def run(self, tracker: Tracker) -> None: + print("Hello world from ExampleIngestAction.") +``` + +Our dataset already exists on disk in a special directory (`data/`) which is +accessible both inside and outside the Docker container. This should generally +only be used for initial dataset ingestion -- downstream +[`Action`][aeromancy.action.Action]s should not use this path. + +```python + dataset_paths = [ + Path("data/example_train_data.txt"), + Path("data/example_test_data.txt"), + ] +``` + +We can associate arbitrary metrics with the dataset: + +```python + dataset_metadata = { + "num_train_records": dataset_paths[0].read_text().splitlines(), + "num_test_records": dataset_paths[1].read_text().splitlines(), + } +``` + +We'll use `outputs()` from above to keep artifact names in sync. + +```python + [dataset_artifact_name] = self.outputs() +``` + +Now we're ready to declare `dataset_artifact_name` as an output dependency with +[`tracker.declare_output`][aeromancy.Tracker.declare_output]. We'll go over each +argument: + +- `name`: This is the name of the artifact we're declaring. This name is used in + many places: + + 1. It needs to match one of the names in list of artifact names returned by + `outputs()`, so it will be part of the name of any jobs that run this + Action. + 2. Downstream [`Action`][aeromancy.action.Action]s will be able to refer to + this artifact by this name. + 3. This is also the name of the corresponding Weights and Biases artifact. +- `local_filenames`: A list of files that should be included in the artifact. +- `s3_destination`: Where to store the artifact in the blob store -- this + includes the bucket and key (a path prefix). This is purely for organization + purposes -- naming destinations clearly could also aid with debugging but in + general, you won't need to know or use S3 paths. +- `artifact_type`: This is purely for organization purposes and will be exposed + in Weights and Biases. We recommend a human-readable version of the file type. +- `metadata`: This is an optional property for any extra metadata that you'd + like to associate with the artifact (it will also be exposed in Weights and + Biases). It can also include nested data and store a wide range of types. +- `strip_prefix`: This is the portion of the `local_filenames` paths that we + don't want to use include in our artifact names on the blob store. In this + case, this means we'll store `data/example_train_data.txt` as + `dataset/bogus-example_train_data.txt` in the `example-bucket` bucket (the + `dataset/` comes from our `s3_destination` key). + +```python + tracker.declare_output( + name=dataset_artifact_name, + local_filenames=dataset_paths, + s3_destination=S3Object("example-bucket", "dataset/"), + artifact_type="dataset", + metadata=dataset_metadata, + strip_prefix="data/", + ) +``` + +We've created our first [`Action`][aeromancy.action.Action]. Next, let's look at +`ExampleTrainAction` which will use the dataset stored by `ExampleIngestAction`. + +#### Using configuration options and `Artifact`s with `ExampleTrainAction` + +We'll focus on the novel parts of `ExampleTrainAction` (see the generated code +for some additional commentary). First, we'll introduce a configuration +parameter. Parameters can be anything that changes behavior or helps you +organize your experiments -- these include hyperparameters, toggling features, +or your own metadata. Let's look at `__init__` where `learning_rate` is our +example configuration parameter. Also note that we take a reference to a +`ExampleIngestAction`. This will indicate a dependency and help Aeromancy know +that it needs to run first. You might also be wondering about where +`ingest_dataset` and `learning_rate` are set -- this will happen later in our +[`ActionBuilder`][aeromancy.action_builder.ActionBuilder]. + +```python + def __init__( + self, + ingest_dataset: ExampleIngestAction, + learning_rate: float, + ): + self.learning_rate = learning_rate +``` + +We need to call our superconstructor which include `ingest_dataset` as a parent +Action as well as our configuration parameter: + +```python + Action.__init__(self, parents=[ingest_dataset], learning_rate=learning_rate) +``` + +In our `run()` method, now we'll be able to use the artifact from our parent: + +```python + @override + def run(self, tracker: Tracker) -> None: + print("Hello world from ExampleTrainAction.") +``` + +This demonstrates `get_io()`, a helper method to simultaneously provide input +and output artifact names. Most [`Action`][aeromancy.action.Action]s include a +call to this. Note that inputs and outputs are each lists which is why we're +using brackets to unpack these. Also note that the order of the input artifact +names will follow the order of parent [`Action`][aeromancy.action.Action]s (see +`ExampleEvaluationAction` for an example of an +[`Action`][aeromancy.action.Action] with multiple parents and thus multiple +input artifacts). + +```python + [dataset_artifact_name], [model_artifact_name] = self.get_io() +``` + +Once we know the name of our input artifact, we need to declare it as a +dependency. This is the counterpart of +[`tracker.declare_output`][aeromancy.Tracker.declare_output] from +`ExampleIngestAction`. It will resolve the artifact to the appropriate version +and return the paths we should use to read the dataset. + +```python + dataset_paths = tracker.declare_input(dataset_artifact_name) + + train_data = dataset_paths[0].read_text() + print(f"Training data: {train_data!r}") +``` + +#### Logging metrics + +As we've already seen, we can associate arbitrary metadata/metrics with +artifacts as part of +[`tracker.declare_output`][aeromancy.Tracker.declare_output]. We can also log +metrics about the status of an `Action` with +[`tracker.log`][aeromancy.Tracker.log]. Returning to the `run()` method in +`ExampleTrainAction`: + +```python + # Now we pretend to train a model. + num_iterations = 10 + # Seeding your RNG is always a good idea for better reproducibility. + rng = random.Random(x=7) + for step in range(num_iterations): + # We can store information about the experiment while it's being + # run. + tracker.log( + { + "step": step, + "train_error": rng.random(), + }, + ) +``` + +### ActionBuilder + +An [`ActionBuilder`][aeromancy.action_builder.ActionBuilder] +(`src/aerodemo/action_builder.py`) is responsible for constructing a dependency +graph of [`Action`][aeromancy.action.Action]s. It will be able to receive +options from the command-line in `__init__`: + +```python + def __init__( + self, + learning_rate: float, + ): + """Create an `ActionBuilder` for aerodemo.""" + # The project name is for organizational purposes and will be the + # project name in Weights and Biases. + ActionBuilder.__init__(self, project_name="aerodemo") + + self.learning_rate = learning_rate +``` + +The main logic here happens in +[`build_actions`][aeromancy.ActionBuilder.build_actions], which constructs the +[`Action`][aeromancy.action.Action] objects we defined above. When we construct +an [`Action`][aeromancy.action.Action], we need to add it to a list using +[`self.add_action`][aeromancy.ActionBuilder.add_action]: + +!!! note + This API is likely to be simplified in the near future. + +```python + @override + def build_actions(self) -> list[Action]: + actions = [] + + # Build each Action in sequence. Note that we use the helper method + # add_action rather than appending to the list directly, since + # add_action needs to do some work behind the scenes. + ingest_action = self.add_action(actions, ExampleIngestAction(parents=[])) + train_action = self.add_action( + actions, + ExampleTrainAction( + ingest_dataset=ingest_action, + learning_rate=self.learning_rate, + ), + ) + self.add_action( + actions, + ExampleEvaluationAction( + ingest_dataset=ingest_action, + train_model=train_action, + ), + ) + return actions +``` + +### AeroMain + +`src/main.py`, typically referred to as **AeroMain**, is the command-line entry +point to an Aeromancy project, responsible for determining configuration +options, constructing an +[`ActionBuilder`][aeromancy.action_builder.ActionBuilder], and launching it. By +default, Aeromancy will always look for AeroMain in `src/main.py`. + +It uses [Click](https://click.palletsprojects.com/) for option parsing and +Aeromancy provides a bundle of its own options in +[`@aeromancy_click_options`][aeromancy.click_options.aeromancy_click_options]. +Using [`rich.console`](https://rich.readthedocs.io/en/stable/console.html) for +console logging is optional. + +```python +@click.command() +@click.option( + "-l", + "--learning-rate", + metavar="FLOAT", + default=1e-3, + type=float, + help="Learning rate in optimizer.", +) +# We also need to include a list of standard Aeromancy options. +@aeromancy_click_options +# Make sure to include any new options we created as arguments to aeromain. +def aeromain( + learning_rate: float, + **aeromancy_options, +): + """CLI application for controlling aerodemo.""" +``` + +Within the `aeromain()` function, we construct an +[`ActionBuilder`][aeromancy.action_builder.ActionBuilder] (you can use more than +one if you have several similar pipelines in the same experiment), then convert +it to to an [`ActionRunner`][aeromancy.action_runner.ActionRunner] and run the +actions: + +```python + config = {"learning_rate": learning_rate} + console.log("Config parameters from CLI:", config) + + # This builds our Action dependency graph given the configuration passed in. + action_builder = ExampleActionBuilder(**config) + # We create a corresponding runner to execute the dependency graph and kick + # it off. + action_runner = action_builder.to_runner() + action_runner.run_actions(**aeromancy_options) +``` + +## Running our first experiments + +Aeromancy projects all include standard scripts for running Aeromancy. The main +script is called `go` which runs AeroMain. For the Quick Start, we'll use +development mode with the `--dev` flag. + +!!! info + **Development mode** makes it easy to test and develope pipelines quickly. + It lets you run uncommitted code outside of a Docker container and Weights + and Biases to speed up the developer loop. It will attempt to read artifacts + from S3 so doesn't work completely offline (unless you already have the + artifacts cached from previous development mode runs). It's behavior is very + close to "production" mode with the main exception that it is not + necessarily using the same artifact versions. + +### Listing available [`Action`][aeromancy.action.Action]s + +Let's start by listing all the +[`Action`][aeromancy.action.Action] with `--list`: + +```bash +pdm go --dev --list +``` + +You should see something like this: + +```bash +[12:00:00] Running 'pdm run python src/main.py --list' +[12:00:01] Config parameters from CLI: + {'learning_rate': 0.001} +[ingest-dataset] example-dataset +[train-model] example-model +[eval-model] example-model-predictions +``` + +We can see the results of our `console.log` statement with the default value for + the learning rate parameter. This is followed by a list of all +[`Action`][aeromancy.action.Action]s our +[`ActionBuilder`][aeromancy.action_builder.ActionBuilder] built. The `job_type` +is shown in brackets, followed by a list of output artifacts. + +### Running the pipeline + +Assuming we're happy with the [`Action`][aeromancy.action.Action]s, we can run +them all by omitting `--list`: + +```bash +pdm go --dev +``` + +You should see it run each [`Action`][aeromancy.action.Action] in sequence. +Don't worry if it's overwhelming at first. Because we're running in development +mode, we're using a [fake tracker][aeromancy.fake_tracker.FakeTracker] instead +of the production Weights and Biases tracker, so you'll see a lot of messages +from it about what would happen if we were running in production mode. + +### Job selection + +Sometimes (in our experience, often) we don't want to run the entire pipeline. +To run just some of the jobs, pass the `--only` flag. Aeromancy will then only +run jobs with a name that includes that substring. You can pass it a +comma-separated list. Note that names include the `job_type` as well. + +!!! example + + - If you pass `--only train`, it will just run `ExampleTrainAction` + + - If you pass `--only model`, it will run `ExampleTrainAction` then + `ExampleEvaluationAction` (since the latter depends on the former) + + - If you pass `--only dataset,train`, it will run `ExampleIngestAction` then + `ExampleTrainAction` + +## What's next? + +We've gone through all the main components you'll need to define to run +experiments in Aeromancy and how to run them in development mode. Next up, you +might want to: + +- [Configure](setup.md) Aeromancy to work with Weights and Biases and + S3-compatible blob stores (production mode) +- (To be documented) Developing and Debugging (`bailout`, `--debug`, common + pitfalls, `aeroset`, `aeroview`, `rerun` commands) +- [Customizing](customizing.md) your Aeromancy project +- (To be documented) Best practices and FAQ +- (To be documented) Debugging Aeromancy itself (for Aeromancy developers) diff --git a/docs/docs/scaffolding.md b/docs/docs/scaffolding.md index b55ebfc..540d0b1 100644 --- a/docs/docs/scaffolding.md +++ b/docs/docs/scaffolding.md @@ -6,7 +6,7 @@ doing the cross-references. --> In order to enable tracking, Aeromancy is rather opinionated about how projects are set up. A "project" in this case means a pipeline of tasks, potentially configurable through CLI flags. This document provides an overview of the -components involved and how to set up a new Aeromancy project. +components involved. This diagram roughly shows the flow: @@ -81,59 +81,3 @@ generated [`Action`][aeromancy.action.Action]s. See [Tasks, Trackers, and Actions](tasks.md) for more information on these objects. - -## Creating a new Aeromancy project - -In order to set up a new project, you'll need a Git repository with these -components: - -- Actions (subclasses of [`Action`][aeromancy.action.Action] with specific logic - for your tasks) -- An [`ActionBuilder`][aeromancy.action_builder.ActionBuilder] to instantiate - the [`Action`][aeromancy.action.Action] objects and describe their - dependencies -- An "AeroMain" script to parse any project-specific options and bring it all - together - -To quickly set up an Aeromancy project, we've created a -[Copier](https://copier.readthedocs.io/en/stable/) template. See instructions at -the -[quant-aq/aeromancy-project-template](https://github.com/quant-aq/aeromancy-project-template?tab=readme-ov-file#quick-start). -In the generated Python project setup (`pyproject.toml`), you may also want to -adjust: - -- **Extra Python packages:** Add them with `pdm add `. See [PDM - docs](https://pdm.fming.dev/latest/usage/dependency/) for more information on - this. -- **`pdm` [scripts](https://pdm.fming.dev/latest/usage/scripts/)**: Some of - these are necessary for running Aeromancy (like `pdm go`), but you can add - more if there are common tasks for your project. -- **Extra `docker run` arguments**: E.g., mounting - [volumes](https://docs.docker.com/engine/reference/commandline/run/#mount)). - These can be baked `pdm go` script with `--extra-docker-run-args='...'`. -- **Extra Debian packages:** (outside of those included by Aeromancy), you may - want to bake them into the `pdm go` script with `--extra-debian-package='...'` - (specify the flag once per package name). -- **Development environment (linters, etc.):** Aeromancy encourages the use of - the `ruff` linter and `Black` formatter, but these are customizable. - -### Filesystem layout - -Ultimately, the structure of an Aeromancy project should look something like -this: - -```text -/ - pyproject.toml - pdm.lock - main.py # AeroMain - src/ - / - .py - .py -``` - -The structure of the classes containing your -[`Action`][aeromancy.action.Action](s) and -[`ActionBuilder`][aeromancy.action_builder.ActionBuilder] is flexible -- they -just need to be importable in AeroMain. diff --git a/docs/docs/setup.md b/docs/docs/setup.md new file mode 100644 index 0000000..fd3ef95 --- /dev/null +++ b/docs/docs/setup.md @@ -0,0 +1,41 @@ +# Installing and setting up Aeromancy + +The easiest way to setup Aeromancy is to follow the [Quick +Start](quick_start.md) guide. This document includes additional setup +instructions for running Aeromany in "production" mode. + +- **Python**: Aeromancy works with Python 3.10.5 or higher +- **Python package manager**: Aeromancy currently requires [`pdm`](https://pdm.fming.dev). + + - Install via `pip install --user pdm` + +- **Environment variables**: + + - To use an S3-compatible backend (e.g., + [Ceph](https://github.com/ceph/ceph)), you'll need to set these + environmental variables: + + - `AEROMANCY_AWS_ACCESS_KEY_ID` + - `AEROMANCY_AWS_SECRET_ACCESS_KEY` + - `AEROMANCY_AWS_S3_ENDPOINT_URL` + - `AEROMANCY_AWS_REGION` (can be left empty if it doesn't apply) + + - You'll also need to set `WANDB_API_KEY` (from [Weights and Biases](https://wandb.ai)) + +- **SSH Authentication**: You'll want `ssh-agent` setup if you need to access + private GitHub repositories. Check out these + [instructions](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent). + +## Linux + +You'll want to install some packages. On Debian, you can use: + +- `apt install bat graphviz libopenblas-dev pre-commit docker.io` + +## Mac OS + +- We recommend using [Homebrew](https://brew.sh/) to install the following: + - `brew install apache-arrow@13.0.0_5 bat@0.23.0 graphviz@8.1.0 + openblas@0.3.24 pre-commit@3.3.3` +- Install Docker Desktop from [docker.com](https://www.docker.com/) (not Brew + since it has a trickier upgrade story) diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index cab935a..bc54f9e 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -7,8 +7,12 @@ site_dir: "site" nav: - Home: - - Overview: index.md - - Scaffolding and new projects: scaffolding.md + - Introduction: index.md + - Quick Start: quick_start.md + - Seting up Aeromancy: setup.md + - Customizing your project: customizing.md + - Developer Reference: + - Scaffolding: scaffolding.md - Tasks, Trackers, and Actions: tasks.md - Code Reference: reference/ @@ -20,6 +24,8 @@ theme: font: text: Open Sans code: Fira Code + features: + - content.code.copy markdown_extensions: - admonition @@ -28,7 +34,6 @@ markdown_extensions: - pymdownx.snippets: base_path: - docs - - ../README.md check_paths: true - pymdownx.superfences: custom_fences: @@ -59,4 +64,3 @@ plugins: watch: - "../src" - - "../README.md" diff --git a/src/aeromancy/s3.py b/src/aeromancy/s3.py index c2dbc63..85094d2 100644 --- a/src/aeromancy/s3.py +++ b/src/aeromancy/s3.py @@ -361,7 +361,7 @@ def from_env_variables(cls): _S3_CLIENT = cls( aws_access_key_id=os.environ["AEROMANCY_AWS_ACCESS_KEY_ID"], aws_secret_access_key=os.environ["AEROMANCY_AWS_SECRET_ACCESS_KEY"], - region_name=os.environ["AEROMANCY_AWS_REGION"], + region_name=os.environ.get("AEROMANCY_AWS_REGION", ""), endpoint_url=os.environ["AEROMANCY_AWS_S3_ENDPOINT_URL"], ) return _S3_CLIENT