Skip to content

[Docs] Minor improvements #2766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/blog/posts/efa.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ name: efa-task
# The size of the cluster
nodes: 2

python: "3.12"
python: 3.12

# Commands to run on each node
commands:
Expand Down
173 changes: 100 additions & 73 deletions docs/docs/concepts/dev-environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,66 +99,6 @@ init:

</div>

### Inactivity duration

Set [`inactivity_duration`](../reference/dstack.yml/dev-environment.md#inactivity_duration)
to automatically stop the dev environment after a configured period of inactivity.

<div editor-title=".dstack.yml">

```yaml
type: dev-environment
name: vscode
ide: vscode

# Stop if inactive for 2 hours
inactivity_duration: 2h
```

</div>

The dev environment becomes inactive when you close the remote VS Code window,
close any `ssh <run name>` shells, and stop the `dstack apply` or `dstack attach` command.
If you go offline without stopping anything manually, the dev environment will also become inactive
within about 3 minutes.

If `inactivity_duration` is configured for your dev environment, you can see how long
it has been inactive in `dstack ps --verbose`.

<div class="termy">

```shell
$ dstack ps --verbose
NAME BACKEND RESOURCES PRICE STATUS SUBMITTED
vscode cudo 2xCPU, 8GB, $0.0286 running 8 mins ago
100.0GB (disk) (inactive for 2m 34s)
```

</div>

If you reattach to the dev environment using [`dstack attach`](../reference/cli/dstack/attach.md),
the inactivity timer will be reset within a few seconds.

??? info "In-place update"
As long as the configuration defines the `name` property, the value of `inactivity_duration`
can be changed for a running dev environment without a restart.
Just change the value in the configuration and run `dstack apply` again.

<div class="termy">

```shell
$ dstack apply -f .dstack.yml

Detected configuration changes that can be updated in-place: ['inactivity_duration']
Update the run? [y/n]:
```

</div>

> `inactivity_duration` is not to be confused with [`idle_duration`](#idle-duration).
> The latter determines how soon the underlying cloud instance will be terminated
> _after_ the dev environment is stopped.

### Resources

When you specify a resource value like `cpu` or `memory`,
Expand Down Expand Up @@ -307,19 +247,6 @@ If you don't assign a value to an environment variable (see `HF_TOKEN` above),
| `DSTACK_REPO_ID` | The ID of the repo |
| `DSTACK_GPUS_NUM` | The total number of GPUs in the run |

### Spot policy

By default, `dstack` uses on-demand instances. However, you can change that
via the [`spot_policy`](../reference/dstack.yml/dev-environment.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.

!!! info "Reference"
Dev environments support many more configuration options,
incl. [`backends`](../reference/dstack.yml/dev-environment.md#backends),
[`regions`](../reference/dstack.yml/dev-environment.md#regions),
[`max_price`](../reference/dstack.yml/dev-environment.md#max_price), and
[`max_duration`](../reference/dstack.yml/dev-environment.md#max_duration),
among [others](../reference/dstack.yml/dev-environment.md).

### Retry policy

By default, if `dstack` can't find capacity or the instance is interrupted, the run will fail.
Expand All @@ -345,8 +272,108 @@ retry:

</div>

### Inactivity duration

Set [`inactivity_duration`](../reference/dstack.yml/dev-environment.md#inactivity_duration)
to automatically stop the dev environment after a configured period of inactivity.

<div editor-title=".dstack.yml">

```yaml
type: dev-environment
name: vscode
ide: vscode

# Stop if inactive for 2 hours
inactivity_duration: 2h
```

</div>

The dev environment becomes inactive when you close the remote VS Code window,
close any `ssh <run name>` shells, and stop the `dstack apply` or `dstack attach` command.
If you go offline without stopping anything manually, the dev environment will also become inactive
within about 3 minutes.

If `inactivity_duration` is configured for your dev environment, you can see how long
it has been inactive in `dstack ps --verbose`.

<div class="termy">

```shell
$ dstack ps --verbose
NAME BACKEND RESOURCES PRICE STATUS SUBMITTED
vscode cudo 2xCPU, 8GB, $0.0286 running 8 mins ago
100.0GB (disk) (inactive for 2m 34s)
```

</div>

If you reattach to the dev environment using [`dstack attach`](../reference/cli/dstack/attach.md),
the inactivity timer will be reset within a few seconds.

??? info "In-place update"
As long as the configuration defines the `name` property, the value of `inactivity_duration`
can be changed for a running dev environment without a restart.
Just change the value in the configuration and run `dstack apply` again.

<div class="termy">

```shell
$ dstack apply -f .dstack.yml

Detected configuration changes that can be updated in-place: ['inactivity_duration']
Update the run? [y/n]:
```

</div>

> `inactivity_duration` is not to be confused with [`idle_duration`](#idle-duration).
> The latter determines how soon the underlying cloud instance will be terminated
> _after_ the dev environment is stopped.

### Utilization policy

Sometimes it’s useful to track whether a dev environment is fully utilizing all GPUs. While you can check this with
[`dstack metrics`](../reference/cli/dstack/metrics.md), `dstack` also lets you set a policy to auto-terminate the run if any GPU is underutilized.

Below is an example of a dev environment that auto-terminate if any GPU stays below 10% utilization for 1 hour.

<div editor-title=".dstack.yml">

```yaml
type: dev-environment
name: my-dev

python: 3.12
ide: cursor

resources:
gpu: H100:8

utilization_policy:
min_gpu_utilization: 10
time_window: 1h
```

</div>

### Spot policy

By default, `dstack` uses on-demand instances. However, you can change that
via the [`spot_policy`](../reference/dstack.yml/dev-environment.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.

--8<-- "docs/concepts/snippets/manage-fleets.ext"

!!! info "Reference"
Dev environments support many more configuration options,
incl. [`backends`](../reference/dstack.yml/dev-environment.md#backends),
[`regions`](../reference/dstack.yml/dev-environment.md#regions),
[`max_price`](../reference/dstack.yml/dev-environment.md#max_price), and
[`max_duration`](../reference/dstack.yml/dev-environment.md#max_duration),
among [others](../reference/dstack.yml/dev-environment.md).


--8<-- "docs/concepts/snippets/manage-runs.ext"

!!! info "What's next?"
Expand Down
80 changes: 56 additions & 24 deletions docs/docs/concepts/services.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ type: service
name: llama31

# If `image` is not specified, dstack uses its default image
python: "3.11"
python: 3.12
env:
- HF_TOKEN
- MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
- MAX_MODEL_LEN=4096
commands:
- pip install vllm
- uv pip install vllm
- vllm serve $MODEL_ID
--max-model-len $MAX_MODEL_LEN
--tensor-parallel-size $DSTACK_GPUS_NUM
Expand Down Expand Up @@ -128,13 +128,13 @@ type: service
# The name is optional, if not specified, generated randomly
name: llama31-service

python: "3.10"
python: 3.12

# Required environment variables
env:
- HF_TOKEN
commands:
- pip install vllm
- uv pip install vllm
- vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
# Expose the port of the service
port: 8000
Expand Down Expand Up @@ -184,7 +184,7 @@ name: http-server-service
# Disable authorization
auth: false

python: "3.10"
python: 3.12

# Commands of the service
commands:
Expand Down Expand Up @@ -220,7 +220,7 @@ env:
- DASH_ROUTES_PATHNAME_PREFIX=/proxy/services/main/dash/

commands:
- pip install dash
- uv pip install dash
# Assuming the Dash app is in your repo at app.py
- python app.py

Expand Down Expand Up @@ -303,11 +303,11 @@ type: service
# The name is optional, if not specified, generated randomly
name: llama31-service

python: "3.10"
python: 3.12

# Commands of the service
commands:
- pip install vllm
- uv pip install vllm
- python -m vllm.entrypoints.openai.api_server
--model mistralai/Mixtral-8X7B-Instruct-v0.1
--host 0.0.0.0
Expand Down Expand Up @@ -384,7 +384,7 @@ type: service
name: http-server-service

# If `image` is not specified, dstack uses its base image
python: "3.10"
python: 3.12

# Commands of the service
commands:
Expand All @@ -407,7 +407,7 @@ port: 8000
name: http-server-service

# If `image` is not specified, dstack uses its base image
python: "3.10"
python: 3.12
# Ensure nvcc is installed (req. for Flash Attention)
nvcc: true

Expand Down Expand Up @@ -480,15 +480,15 @@ type: service
# The name is optional, if not specified, generated randomly
name: llama-2-7b-service

python: "3.10"
python: 3.12

# Environment variables
env:
- HF_TOKEN
- MODEL=NousResearch/Llama-2-7b-chat-hf
# Commands of the service
commands:
- pip install vllm
- uv pip install vllm
- python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
# The port of the service
port: 8000
Expand All @@ -512,18 +512,6 @@ resources:
| `DSTACK_REPO_ID` | The ID of the repo |
| `DSTACK_GPUS_NUM` | The total number of GPUs in the run |

### Spot policy

By default, `dstack` uses on-demand instances. However, you can change that
via the [`spot_policy`](../reference/dstack.yml/service.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.

!!! info "Reference"
Services support many more configuration options,
incl. [`backends`](../reference/dstack.yml/service.md#backends),
[`regions`](../reference/dstack.yml/service.md#regions),
[`max_price`](../reference/dstack.yml/service.md#max_price), and
among [others](../reference/dstack.yml/service.md).

### Retry policy

By default, if `dstack` can't find capacity, or the service exits with an error, or the instance is interrupted, the run will fail.
Expand All @@ -550,8 +538,52 @@ retry:
If one replica of a multi-replica service fails with retry enabled,
`dstack` will resubmit only the failed replica while keeping active replicas running.

### Spot policy

By default, `dstack` uses on-demand instances. However, you can change that
via the [`spot_policy`](../reference/dstack.yml/service.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.

### Utilization policy

Sometimes it’s useful to track whether a service is fully utilizing all GPUs. While you can check this with
[`dstack metrics`](../reference/cli/dstack/metrics.md), `dstack` also lets you set a policy to auto-terminate the run if any GPU is underutilized.

Below is an example of a service that auto-terminate if any GPU stays below 10% utilization for 1 hour.

<div editor-title=".dstack.yml">

```yaml
type: service
name: llama-2-7b-service

python: 3.12
env:
- HF_TOKEN
- MODEL=NousResearch/Llama-2-7b-chat-hf
commands:
- uv pip install vllm
- python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
port: 8000

resources:
gpu: 24GB

utilization_policy:
min_gpu_utilization: 10
time_window: 1h
```

</div>

--8<-- "docs/concepts/snippets/manage-fleets.ext"

!!! info "Reference"
Services support many more configuration options,
incl. [`backends`](../reference/dstack.yml/service.md#backends),
[`regions`](../reference/dstack.yml/service.md#regions),
[`max_price`](../reference/dstack.yml/service.md#max_price), and
among [others](../reference/dstack.yml/service.md).

--8<-- "docs/concepts/snippets/manage-runs.ext"

!!! info "What's next?"
Expand Down
Loading
Loading