Skip to content

Commit 225d9d7

Browse files
authored
Update our oumi launch documentation. (#1239)
1 parent 2527a76 commit 225d9d7

File tree

3 files changed

+362
-122
lines changed

3 files changed

+362
-122
lines changed

docs/user_guides/launch/custom_cluster.md

+7-21
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,7 @@ Similar to custom dataset and model classes, you can register a class for your o
33

44
This guide is specifically geared towards individuals who have access to a compute cluster that's not hosted on a common cloud provider (e.g. University/personal compute clusters).
55

6-
We'll cover the following topics:
7-
1. Prerequisites
8-
1. The Oumi Launcher Hierarchy
9-
1. Creating a CustomClient Class
10-
1. Creating a CustomCluster Class
11-
1. Creating a CustomCloud Class
12-
1. Registering Your CustomCloud
13-
1. Running a Job on Your Cloud
14-
15-
# Prerequisites
16-
17-
## Oumi Installation
18-
First, let's install Oumi. You can find detailed instructions [here](/get_started/installation.md).
19-
20-
# The Oumi Launcher Hierarchy
6+
## The Oumi Launcher Hierarchy
217

228
### Preface
239
Before diving into this tutorial, lets discuss the hierarchy of the Oumi Launcher. At this point, it's worth reading through our tutorial on {doc}`/user_guides/launch/deploy` to better understand the end-to-end flow of the launcher. Already read it? Great!
@@ -40,7 +26,7 @@ Clients are a completely optional but highly encouraged class. Clients should en
4026

4127
You can find several implementations of Clients [here](https://github.com/oumi-ai/oumi/tree/main/src/oumi/launcher/clients).
4228

43-
# Creating a CustomClient Class
29+
## Creating a CustomClient Class
4430
Let's get started by creating a client for our new cloud, `CustomCloud`. Let's create a simple client that randomly sets the state of the job on submission. It also supports canceling jobs, and turning down clusters:
4531

4632
``` {code-block} python
@@ -127,7 +113,7 @@ class CustomClient:
127113
pass
128114
```
129115

130-
# Creating a CustomCluster Class
116+
## Creating a CustomCluster Class
131117
Now that we have a client that talks to our API, we can use the Client to build a Cluster!
132118

133119
``` {code-block} python
@@ -200,7 +186,7 @@ class CustomCluster(BaseCluster):
200186
self.down()
201187
```
202188

203-
# Creating a CustomCloud Class
189+
## Creating a CustomCloud Class
204190
Let's create a CustomCloud to manage our clusters:
205191

206192
``` {code-block} python
@@ -256,7 +242,7 @@ class CustomCloud(BaseCloud):
256242

257243
Now all that's left to do is register your CustomCloud!
258244

259-
# Registering Your CustomCloud
245+
## Registering Your CustomCloud
260246
By implementing the BaseCloud class, you are now ready to register your cloud with Oumi. First, let's take a look at the clouds that are already registered:
261247

262248
``` {code-block} python
@@ -285,7 +271,7 @@ print(launcher.which_clouds())
285271

286272
Great, our CustomCloud is there!
287273

288-
## Using Your CustomCloud via the CLI
274+
### Using Your CustomCloud via the CLI
289275

290276
**‼️ Important ‼️** A few extra steps are needed to use your cloud from the CLI.
291277

@@ -317,7 +303,7 @@ You can verify that your cloud is now installed by running:
317303
oumi launch which
318304
```
319305

320-
# Running a Job on Your Cloud
306+
## Running a Job on Your Cloud
321307

322308
Let's take our new Cloud for a spin:
323309

docs/user_guides/launch/deploy.md

+80-101
Original file line numberDiff line numberDiff line change
@@ -4,105 +4,32 @@ In this tutorial we'll take a working {py:class}`~oumi.core.configs.JobConfig` a
44

55
This guide dovetails nicely with our [Finetuning Tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Finetuning%20Tutorial.ipynb) where you create your own TrainingConfig and run it locally. Give it a try if you haven't already!
66

7-
We'll cover the following topics:
8-
1. Prerequisites
9-
1. Choosing a Cloud
10-
1. Preparing Your JobConfig
11-
1. Launching Your Job
12-
1. \[Advanced\] Deploying a Training Config
137

14-
## Prerequisites
15-
16-
### Oumi Installation
17-
First, let's install Oumi. You can find detailed instructions [here](/get_started/installation.md).
18-
19-
### Creating a working directory
20-
For this tutorial, we'll use the following folder to save our configs.
21-
22-
``` {code-block} python
23-
from pathlib import Path
24-
25-
tutorial_dir = "deploy_training_tutorial"
26-
27-
Path(tutorial_dir).mkdir(parents=True, exist_ok=True)
28-
```
29-
30-
## Choosing a Cloud
31-
We'll be using the Oumi Launcher to run remote training. To use the launcher, you need to specify which cloud you'd like to run training on.
32-
We'll list the clouds below:
33-
34-
::::{tab-set}
35-
:::{tab-item} CLI
36-
``` {code-block} shell
37-
oumi launch which
38-
```
39-
:::
40-
41-
:::{tab-item} Python
42-
``` {code-block} python
43-
import oumi.launcher as launcher
44-
45-
# Print all available clouds
46-
print(launcher.which_clouds())
47-
```
48-
:::
49-
::::
50-
51-
#### Local Cloud
52-
If you don't have any clouds set up yet, feel free to use the `local` cloud. This will simply execute your job on your current device as if it's a remote cluster. Hardware requirements are ignored for the `local` cloud.
53-
54-
#### Other Providers
55-
Note that to use a cloud you must already have an account registered with that cloud provider.
56-
57-
For example, GCP, RunPod, and Lambda require accounts with billing enabled.
58-
59-
Once you've picked a cloud, move on to the next step.
60-
61-
## Preparing Your JobConfig
62-
Let's get started by creating your {py:class}`~oumi.core.configs.JobConfig`. In the config below, feel free to change `cloud: local` to the cloud you chose in the previous step.
63-
64-
A sample job is provided below:
65-
````{dropdown} deploy_training_tutorial/job.yaml
66-
```{code-block} yaml
67-
name: job-tutorial
68-
resources:
69-
cloud: local
70-
# Accelerators is ignored for the local cloud.
71-
# This is required for other clouds like GCP, AWS, etc.
72-
accelerators: A100
73-
74-
# Upload working directory to remote.
75-
# If on the local cloud, we CD into the working directory before running the job.
76-
working_dir: .
77-
78-
envs:
79-
TEST_ENV_VARIABLE: '"Hello, World!"'
80-
OUMI_LOGGING_DIR: "deploy_training_tutorial/logs"
81-
82-
# `setup` will always be executed once when a cluster is created
83-
setup: |
84-
echo "Running setup..."
85-
86-
run: |
87-
set -e # Exit if any command failed.
8+
## Launching Your Job
889

89-
echo "$TEST_ENV_VARIABLE"
10+
`````{note}
11+
Try using our sample helloworld job for this tutorial:
12+
````{dropdown} configs/examples/misc/hello_world_gcp_job.yaml
13+
```{literalinclude} ../../../configs/examples/misc/hello_world_gcp_job.yaml
14+
:language: yaml
9015
```
9116
````
17+
`````
9218

93-
## Launching Your Job
19+
Let's get started with launching a job! Don't worry about the nitty-gritty—we'll
20+
address configuring your job in the following sections.
9421

9522
::::{tab-set}
9623
:::{tab-item} CLI
9724
You can easily kick off a job directly from the CLI:
9825
```{code-block} shell
99-
oumi launch up --cluster my-cluster -c deploy_training_tutorial/job.yaml
26+
oumi launch up --cluster my-cluster -c configs/examples/misc/hello_world_gcp_job.yaml
10027
```
10128

10229
At any point you can easily change the cloud where your job will run by modifying the job's `resources.cloud` parameter:
10330

10431
```{code-block} shell
105-
oumi launch up --cluster my-cluster -c deploy_training_tutorial/job.yaml --resources.cloud local
32+
oumi launch up --cluster my-cluster -c configs/examples/misc/hello_world_gcp_job.yaml --resources.cloud local
10633
```
10734
:::
10835

@@ -111,7 +38,8 @@ First let's load your {py:class}`~oumi.core.configs.JobConfig`:
11138
``` {code-block} python
11239
import oumi.launcher as launcher
11340
# Read our JobConfig from the YAML file.
114-
job_config = launcher.JobConfig.from_yaml(str(Path(tutorial_dir) / "job.yaml"))
41+
working_dir = "YOUR_WORKING_DIRECTORY" # Specify this value
42+
job_config = launcher.JobConfig.from_yaml(str(Path(working_dir) / "job.yaml"))
11543
```
11644

11745
At any point you can easily change the cloud where your job will run by modifying the job's `resources.cloud` parameter:
@@ -146,8 +74,6 @@ We can quickly check on the status of our job using the `cluster` returned in th
14674
``` {code-block} shell
14775
oumi launch status
14876
```
149-
150-
If the job was run on the local cluster, we can view the logs at `deploy_training_tutorial/logs/...`
15177
:::
15278

15379
:::{tab-item} Python
@@ -159,16 +85,6 @@ while job_status and not job_status.done:
15985
16086
print("Job is done!")
16187
```
162-
163-
If the job was run on the local cluster, we can view the logs below:
164-
165-
``` {code-block} python
166-
logs_dir = Path(tutorial_dir) / "logs"
167-
for log_file in logs_dir.iterdir():
168-
print(f"Log file: {log_file}")
169-
with open(log_file) as f:
170-
print(f.read())
171-
```
17288
:::
17389
::::
17490

@@ -190,8 +106,70 @@ cluster.down()
190106
:::
191107
::::
192108

109+
## Choosing a Cloud
110+
We'll be using the Oumi Launcher to run remote training. To use the launcher, you need to specify which cloud you'd like to run training on.
111+
We'll list the clouds below:
112+
113+
::::{tab-set}
114+
:::{tab-item} CLI
115+
``` {code-block} shell
116+
oumi launch which
117+
```
118+
:::
119+
120+
:::{tab-item} Python
121+
``` {code-block} python
122+
import oumi.launcher as launcher
123+
124+
# Print all available clouds
125+
print(launcher.which_clouds())
126+
```
127+
:::
128+
::::
129+
130+
#### Local Cloud
131+
If you don't have any clouds set up yet, feel free to use the `local` cloud. This will simply execute your job on your current device as if it's a remote cluster. Hardware requirements are ignored for the `local` cloud.
132+
133+
#### Other Providers
134+
Note that to use a cloud you must already have an account registered with that cloud provider.
135+
136+
For example, GCP, RunPod, and Lambda require accounts with billing enabled.
137+
138+
Once you've picked a cloud, move on to the next step.
139+
140+
## Preparing Your JobConfig
141+
Let's get started by creating your {py:class}`~oumi.core.configs.JobConfig`. In the config below, feel free to change `cloud: local` to the cloud you chose in the previous step.
142+
143+
A sample job is provided below:
144+
````{dropdown} job.yaml
145+
```{code-block} yaml
146+
name: job-tutorial
147+
resources:
148+
cloud: local
149+
# Accelerators is ignored for the local cloud.
150+
# This is required for other clouds like GCP, AWS, etc.
151+
accelerators: A100
152+
153+
# Upload working directory to remote.
154+
# If on the local cloud, we CD into the working directory before running the job.
155+
working_dir: .
156+
157+
envs:
158+
TEST_ENV_VARIABLE: '"Hello, World!"'
159+
OUMI_LOGGING_DIR: "deploy_tutorial/logs"
160+
161+
# `setup` will always be executed once when a cluster is created
162+
setup: |
163+
echo "Running setup..."
164+
165+
run: |
166+
set -e # Exit if any command failed.
167+
168+
echo "$TEST_ENV_VARIABLE"
169+
```
170+
````
193171

194-
## \[Advanced\] Deploying a Training Config
172+
## Deploying a Training Config
195173

196174
In our [Finetuning Tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Finetuning%20Tutorial.ipynb), we created and saved a TrainingConfig. We then invoked training by running
197175
```shell
@@ -204,14 +182,15 @@ You can also run that command as a job! Simply update the "run" section of the {
204182
::::{tab-set}
205183
:::{tab-item} CLI
206184
``` {code-block} shell
207-
export PATH_TO_YOUR_TRAIN_CONFIG="deploy_training_tutorial/train.yaml" # Make sure this exists!
208-
oumi launch up --cluster my-new-cluster -c deploy_training_tutorial/job.yaml --run "oumi train -c $PATH_TO_YOUR_TRAIN_CONFIG" --setup "pip install oumi"
185+
export PATH_TO_YOUR_TRAIN_CONFIG="deploy_tutorial/train.yaml" # Make sure this exists!
186+
oumi launch up --cluster my-new-cluster -c deploy_tutorial/job.yaml --run "oumi train -c $PATH_TO_YOUR_TRAIN_CONFIG" --setup "pip install oumi"
209187
```
210188
:::
211189

212190
:::{tab-item} Python
213191
``` {code-block} python
214-
path_to_your_train_config = Path(tutorial_dir) / "train.yaml" # Make sure this exists!
192+
working_dir = "YOUR_WORKING_DIRECTORY" # Specify this value
193+
path_to_your_train_config = Path(working_dir) / "train.yaml" # Make sure this exists!
215194
216195
# Set the `run` command to run your training script.
217196
job_config.run = f'oumi train -c "{path_to_your_train_config}"'

0 commit comments

Comments
 (0)