You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user_guides/launch/custom_cluster.md
+7-21
Original file line number
Diff line number
Diff line change
@@ -3,21 +3,7 @@ Similar to custom dataset and model classes, you can register a class for your o
3
3
4
4
This guide is specifically geared towards individuals who have access to a compute cluster that's not hosted on a common cloud provider (e.g. University/personal compute clusters).
5
5
6
-
We'll cover the following topics:
7
-
1. Prerequisites
8
-
1. The Oumi Launcher Hierarchy
9
-
1. Creating a CustomClient Class
10
-
1. Creating a CustomCluster Class
11
-
1. Creating a CustomCloud Class
12
-
1. Registering Your CustomCloud
13
-
1. Running a Job on Your Cloud
14
-
15
-
# Prerequisites
16
-
17
-
## Oumi Installation
18
-
First, let's install Oumi. You can find detailed instructions [here](/get_started/installation.md).
19
-
20
-
# The Oumi Launcher Hierarchy
6
+
## The Oumi Launcher Hierarchy
21
7
22
8
### Preface
23
9
Before diving into this tutorial, lets discuss the hierarchy of the Oumi Launcher. At this point, it's worth reading through our tutorial on {doc}`/user_guides/launch/deploy` to better understand the end-to-end flow of the launcher. Already read it? Great!
@@ -40,7 +26,7 @@ Clients are a completely optional but highly encouraged class. Clients should en
40
26
41
27
You can find several implementations of Clients [here](https://github.com/oumi-ai/oumi/tree/main/src/oumi/launcher/clients).
42
28
43
-
# Creating a CustomClient Class
29
+
##Creating a CustomClient Class
44
30
Let's get started by creating a client for our new cloud, `CustomCloud`. Let's create a simple client that randomly sets the state of the job on submission. It also supports canceling jobs, and turning down clusters:
45
31
46
32
```{code-block} python
@@ -127,7 +113,7 @@ class CustomClient:
127
113
pass
128
114
```
129
115
130
-
# Creating a CustomCluster Class
116
+
##Creating a CustomCluster Class
131
117
Now that we have a client that talks to our API, we can use the Client to build a Cluster!
132
118
133
119
```{code-block} python
@@ -200,7 +186,7 @@ class CustomCluster(BaseCluster):
200
186
self.down()
201
187
```
202
188
203
-
# Creating a CustomCloud Class
189
+
##Creating a CustomCloud Class
204
190
Let's create a CustomCloud to manage our clusters:
205
191
206
192
```{code-block} python
@@ -256,7 +242,7 @@ class CustomCloud(BaseCloud):
256
242
257
243
Now all that's left to do is register your CustomCloud!
258
244
259
-
# Registering Your CustomCloud
245
+
##Registering Your CustomCloud
260
246
By implementing the BaseCloud class, you are now ready to register your cloud with Oumi. First, let's take a look at the clouds that are already registered:
Copy file name to clipboardExpand all lines: docs/user_guides/launch/deploy.md
+80-101
Original file line number
Diff line number
Diff line change
@@ -4,105 +4,32 @@ In this tutorial we'll take a working {py:class}`~oumi.core.configs.JobConfig` a
4
4
5
5
This guide dovetails nicely with our [Finetuning Tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Finetuning%20Tutorial.ipynb) where you create your own TrainingConfig and run it locally. Give it a try if you haven't already!
6
6
7
-
We'll cover the following topics:
8
-
1. Prerequisites
9
-
1. Choosing a Cloud
10
-
1. Preparing Your JobConfig
11
-
1. Launching Your Job
12
-
1.\[Advanced\] Deploying a Training Config
13
7
14
-
## Prerequisites
15
-
16
-
### Oumi Installation
17
-
First, let's install Oumi. You can find detailed instructions [here](/get_started/installation.md).
18
-
19
-
### Creating a working directory
20
-
For this tutorial, we'll use the following folder to save our configs.
We'll be using the Oumi Launcher to run remote training. To use the launcher, you need to specify which cloud you'd like to run training on.
32
-
We'll list the clouds below:
33
-
34
-
::::{tab-set}
35
-
:::{tab-item} CLI
36
-
```{code-block} shell
37
-
oumi launch which
38
-
```
39
-
:::
40
-
41
-
:::{tab-item} Python
42
-
```{code-block} python
43
-
import oumi.launcher as launcher
44
-
45
-
# Print all available clouds
46
-
print(launcher.which_clouds())
47
-
```
48
-
:::
49
-
::::
50
-
51
-
#### Local Cloud
52
-
If you don't have any clouds set up yet, feel free to use the `local` cloud. This will simply execute your job on your current device as if it's a remote cluster. Hardware requirements are ignored for the `local` cloud.
53
-
54
-
#### Other Providers
55
-
Note that to use a cloud you must already have an account registered with that cloud provider.
56
-
57
-
For example, GCP, RunPod, and Lambda require accounts with billing enabled.
58
-
59
-
Once you've picked a cloud, move on to the next step.
60
-
61
-
## Preparing Your JobConfig
62
-
Let's get started by creating your {py:class}`~oumi.core.configs.JobConfig`. In the config below, feel free to change `cloud: local` to the cloud you chose in the previous step.
63
-
64
-
A sample job is provided below:
65
-
````{dropdown} deploy_training_tutorial/job.yaml
66
-
```{code-block} yaml
67
-
name: job-tutorial
68
-
resources:
69
-
cloud: local
70
-
# Accelerators is ignored for the local cloud.
71
-
# This is required for other clouds like GCP, AWS, etc.
72
-
accelerators: A100
73
-
74
-
# Upload working directory to remote.
75
-
# If on the local cloud, we CD into the working directory before running the job.
76
-
working_dir: .
77
-
78
-
envs:
79
-
TEST_ENV_VARIABLE: '"Hello, World!"'
80
-
OUMI_LOGGING_DIR: "deploy_training_tutorial/logs"
81
-
82
-
# `setup` will always be executed once when a cluster is created
83
-
setup: |
84
-
echo "Running setup..."
85
-
86
-
run: |
87
-
set -e # Exit if any command failed.
8
+
## Launching Your Job
88
9
89
-
echo "$TEST_ENV_VARIABLE"
10
+
`````{note}
11
+
Try using our sample helloworld job for this tutorial:
At any point you can easily change the cloud where your job will run by modifying the job's `resources.cloud` parameter:
@@ -146,8 +74,6 @@ We can quickly check on the status of our job using the `cluster` returned in th
146
74
```{code-block} shell
147
75
oumi launch status
148
76
```
149
-
150
-
If the job was run on the local cluster, we can view the logs at `deploy_training_tutorial/logs/...`
151
77
:::
152
78
153
79
:::{tab-item} Python
@@ -159,16 +85,6 @@ while job_status and not job_status.done:
159
85
160
86
print("Job is done!")
161
87
```
162
-
163
-
If the job was run on the local cluster, we can view the logs below:
164
-
165
-
```{code-block} python
166
-
logs_dir = Path(tutorial_dir) / "logs"
167
-
for log_file in logs_dir.iterdir():
168
-
print(f"Log file: {log_file}")
169
-
with open(log_file) as f:
170
-
print(f.read())
171
-
```
172
88
:::
173
89
::::
174
90
@@ -190,8 +106,70 @@ cluster.down()
190
106
:::
191
107
::::
192
108
109
+
## Choosing a Cloud
110
+
We'll be using the Oumi Launcher to run remote training. To use the launcher, you need to specify which cloud you'd like to run training on.
111
+
We'll list the clouds below:
112
+
113
+
::::{tab-set}
114
+
:::{tab-item} CLI
115
+
```{code-block} shell
116
+
oumi launch which
117
+
```
118
+
:::
119
+
120
+
:::{tab-item} Python
121
+
```{code-block} python
122
+
import oumi.launcher as launcher
123
+
124
+
# Print all available clouds
125
+
print(launcher.which_clouds())
126
+
```
127
+
:::
128
+
::::
129
+
130
+
#### Local Cloud
131
+
If you don't have any clouds set up yet, feel free to use the `local` cloud. This will simply execute your job on your current device as if it's a remote cluster. Hardware requirements are ignored for the `local` cloud.
132
+
133
+
#### Other Providers
134
+
Note that to use a cloud you must already have an account registered with that cloud provider.
135
+
136
+
For example, GCP, RunPod, and Lambda require accounts with billing enabled.
137
+
138
+
Once you've picked a cloud, move on to the next step.
139
+
140
+
## Preparing Your JobConfig
141
+
Let's get started by creating your {py:class}`~oumi.core.configs.JobConfig`. In the config below, feel free to change `cloud: local` to the cloud you chose in the previous step.
142
+
143
+
A sample job is provided below:
144
+
````{dropdown} job.yaml
145
+
```{code-block} yaml
146
+
name: job-tutorial
147
+
resources:
148
+
cloud: local
149
+
# Accelerators is ignored for the local cloud.
150
+
# This is required for other clouds like GCP, AWS, etc.
151
+
accelerators: A100
152
+
153
+
# Upload working directory to remote.
154
+
# If on the local cloud, we CD into the working directory before running the job.
155
+
working_dir: .
156
+
157
+
envs:
158
+
TEST_ENV_VARIABLE: '"Hello, World!"'
159
+
OUMI_LOGGING_DIR: "deploy_tutorial/logs"
160
+
161
+
# `setup` will always be executed once when a cluster is created
162
+
setup: |
163
+
echo "Running setup..."
164
+
165
+
run: |
166
+
set -e # Exit if any command failed.
167
+
168
+
echo "$TEST_ENV_VARIABLE"
169
+
```
170
+
````
193
171
194
-
## \[Advanced\]Deploying a Training Config
172
+
## Deploying a Training Config
195
173
196
174
In our [Finetuning Tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Finetuning%20Tutorial.ipynb), we created and saved a TrainingConfig. We then invoked training by running
197
175
```shell
@@ -204,14 +182,15 @@ You can also run that command as a job! Simply update the "run" section of the {
204
182
::::{tab-set}
205
183
:::{tab-item} CLI
206
184
```{code-block} shell
207
-
export PATH_TO_YOUR_TRAIN_CONFIG="deploy_training_tutorial/train.yaml" # Make sure this exists!
0 commit comments