Releases: dstackai/dstack-enterprise
0.19.25-v1
Migration guide
Warning
This update requires stopping all dstack
server replicas before deploying, due to database schema changes.
Make sure no replicas from the previous version and the new version run at the same time.
CLI
dstack offer --group-by
The dstack offer
command can now display aggregated information about available offers. For example, to see what GPUs are available in different clouds, use --group-by gpu
.
> dstack offer --group-by gpu
# GPU SPOT $/GPU BACKENDS
1 T4:16GB:1..8 spot, on-demand 0.1037..1.3797 gcp, aws
2 L4:24GB:1..8 spot, on-demand 0.1829..2.1183 gcp, aws
3 P100:16GB:1..4 spot, on-demand 0.2115..2.4043 gcp, oci
4 V100:16GB:1..8 spot, on-demand 0.3152..4.234 gcp, aws, oci, lambda
5 A10G:22GB:1..8 spot, on-demand 0.3623..2.5845 aws
6 L40S:44GB:1..8 spot, on-demand 0.6392..4.7095 aws
7 A100:40GB:1..16 spot, on-demand 0.6441..4.0496 gcp, aws, oci, lambda
8 A10:24GB:1..4 on-demand 0.75..2 oci, lambda
9 H100:80GB:1..8 spot, on-demand 1.079..15.7236 gcp, aws, lambda
10 A100:80GB:1..8 spot, on-demand 1.2942..5.7077 gcp, aws, lambda
Refer to the docs for information about the available aggregations.
Deprecations
- Local repos are now deprecated. If you need to deliver a local directory or file to a run, use
files
instead. If the run doesn't require a repo, usedstack apply --no-repo
. Remote repos remain the recommended way to deliver Git repos to runs.
What's changed
- [Internal] Replace enums with strings in the DB,
JobSubmission.termination_reason
, andRun.termination_reason
by @r4victor in dstackai/dstack#2949 - [Internal] Fix macOS build for shim by @un-def in dstackai/dstack#2958
- [Bug] Increase the secrets max character length by @james-boydell in dstackai/dstack#2971
- [Internal] Introduce
InstanceAvailability.NO_BALANCE
(for external integrations) by @peterschmidt85 in dstackai/dstack#2975 - [Bug]: Cannot manage secrets in UI as project admin by @olgenn in dstackai/dstack#2972
- [Bug] Fix
DCGMWrapperInterface
nil check in shim by @un-def in dstackai/dstack#2980 - Document Deployment-compatible migrations by @r4victor in dstackai/dstack#2987
- [Bug]: Server Docker image fails because of Unable to locate package … by @peterschmidt85 in dstackai/dstack#2983
- Only register service replicas after probes pass by @jvstme in dstackai/dstack#2986
- [Changelog] Introducing service probes by @peterschmidt85 in dstackai/dstack#2988
- Deprecate local repos by @un-def in dstackai/dstack#2984
- Support elastic fleets by @r4victor in dstackai/dstack#2967
- fix typo config.yml.md by @jspablo in dstackai/dstack#2991
- Check if kapa.ai can also be integrated into dstack Sky #296 by @olgenn in dstackai/dstack#2990
- Typo in URLs by @mashcroft3 in dstackai/dstack#2995
- [shim] Fix
DCGMWrapperInterface
nil check (bis) by @un-def in dstackai/dstack#3001 - The logs section is too short in the UI by @olgenn in dstackai/dstack#2989
- [Feature]: Allow
dstack offer
to aggregate GPU information by @peterschmidt85 in dstackai/dstack#2992 - [Internal]: CI refactoring by @jvstme in dstackai/dstack#3006
- Update examples by @un-def in dstackai/dstack#3007
- Minor CLI fixes by @peterschmidt85 in dstackai/dstack#3008
New Contributors
- @mashcroft3 made their first contribution in dstackai/dstack#2995
Full changelog: dstackai/dstack@0.19.23...0.19.25
0.19.23-v1
Major bug-fixes
- This release resolves an issue introduced in 0.19.22 that caused instance provisioning to fail consistently for certain instance types.
Backends
Nebius
The nebius
backend now supports spot instances and the NVIDIA B200 GPU.
> dstack offer -b nebius --spot
# BACKEND RESOURCES PRICE
1 nebius (eu-north1) cpu=16 mem=200GB disk=100GB H100:80GB:1 (spot) $1.25
2 nebius (eu-north1) cpu=16 mem=200GB disk=100GB H200:141GB:1 (spot) $1.45
3 nebius (eu-west1) cpu=16 mem=200GB disk=100GB H200:141GB:1 (spot) $1.45
4 nebius (us-central1) cpu=16 mem=200GB disk=100GB H200:141GB:1 (spot) $1.45
5 nebius (eu-north1) cpu=128 mem=1600GB disk=100GB H100:80GB:8 (spot) $10
6 nebius (eu-north1) cpu=128 mem=1600GB disk=100GB H200:141GB:8 (spot) $11.6
7 nebius (eu-west1) cpu=128 mem=1600GB disk=100GB H200:141GB:8 (spot) $11.6
8 nebius (us-central1) cpu=128 mem=1600GB disk=100GB H200:141GB:8 (spot) $11.6
> dstack offer -b nebius --gpu 8:b200
# BACKEND RESOURCES PRICE
1 nebius (us-central1) cpu=160 mem=1792GB disk=100GB B200:180GB:8 $44
What's changed
- Fix
dstack-shim
release build by @jvstme in dstackai/dstack#2964 - [Nebius] Support spot instances and B200 by @peterschmidt85 in dstackai/dstack#2965
Full Changelog: dstackai/dstack@0.19.22...0.19.23
0.19.22-v1
Warning
When updating, make sure to install 0.19.23-v1, the latest bug-fix release.
Services
Probes
You can now configure HTTP probes to check the health of your service.
type: service
name: my-service
port: 80
image: my-app:latest
probes:
- type: http
url: /health
interval: 15s
Probe statuses are displayed in dstack ps --verbose
and are considered during rolling deployments. This enables you to deploy new versions of your service with zero downtime.
> dstack ps --verbose
NAME BACKEND STATUS PROBES SUBMITTED
my-service deployment=1 running 11 mins ago
replica=0 job=0 deployment=0 aws (us-west-2) running ✓ 11 mins ago
replica=1 job=0 deployment=1 aws (us-west-2) running × 1 min ago
Learn more about probes in the docs.
Accelerators
NVIDIA GPU health checks
dstack
now monitors NVIDIA GPU health using DCGM background health checks:
> dstack fleet
FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED
my-fleet 0 aws (us-east-1) T4:16GB:1 $0.526 idle 11 mins ago
1 aws (us-east-1) T4:16GB:1 $0.526 idle (warning) 11 mins ago
2 aws (us-east-1) T4:16GB:1 $0.526 idle (failure) 11 mins ago
In this example, the first instance is healthy, the second has a non-fatal issue and can still be used, and the last has a fatal error that makes it inoperable.
Note
GPU health checks are supported on AWS (except with custom os_images
), Azure (except for A10 GPUs), GCP, and OCI, as well as SSH fleet instances with DCGM installed and configured for background health checks. To use GPU health checks, re-create the fleets that were created before 0.19.22.
Tenstorrent Galaxy
dstack
now supports Tenstorrent Galaxy cards via SSH fleets.
Backends
Hot Aisle
This release features an integration with Hot Aisle, a cloud provider that offers on-demand access to AMD MI300x GPUs at competitive prices.
> dstack offer -b hotaisle
# BACKEND RESOURCES INSTANCE TYPE PRICE
1 hotaisle (us-michigan-1) cpu=13 mem=224GB disk=12288GB MI300X:192GB:1 1x MI300X 13x Xeon Platinum 8470 $1.99
2 hotaisle (us-michigan-1) cpu=8 mem=224GB disk=12288GB MI300X:192GB:1 1x MI300X 8x Xeon Platinum 8470 $1.99
Refer to the docs for instructions on configuring the hotaisle
backend in your dstack
project.
CLI
Reading configurations from stdin
dstack apply
can now read configurations from stdin using the -y -f -
flags. This allows configuration files to be parameterized in arbitrary ways:
> cat .dstack/volume.dstack.yml
type: volume
name: my-vol
backend: aws
region: us-east-1
size: $VOL_SIZE
> export VOL_SIZE=50
> envsubst '$VOL_SIZE' < .dstack/volume.dstack.yml | dstack apply -y -f -
Debug logs
The dstack
CLI now saves debug logs to the ~/.dstack/logs/cli/
directory. These logs can be useful for troubleshooting failed commands or submitting bug reports.
UI
Secrets
The project settings page now has a section to manage secrets.

Logs improvements
The UI can now optionally display timestamps in front of each message in run logs. This can be a lifesaver when debugging runs that write log messages without built-in timestamps.

Additionally, if the dstack
server is configured to use external log storage, such as AWS CloudWatch or GCP Logging, a button will appear in the UI to view the logs in that storage system.
What's changed
- [Feature]: Add UI for managing Secrets #2882 by @olgenn in dstackai/dstack#2911
- [Blog]: Benchmarking AMD GPUs: bare-metal, VMs by @peterschmidt85 in dstackai/dstack#2924
- [Feature]: Implement reading apply configuration from stdin by @r4victor in dstackai/dstack#2938
- Fix precommit by @olgenn in dstackai/dstack#2936
- Fix gateway docs URL by @jspablo in dstackai/dstack#2941
- [Feature]: Service probes by @jvstme in dstackai/dstack#2927
- Return logs
external_url
for AWS and GCP by @r4victor in dstackai/dstack#2944 - [Feature]: Default CLI log level is DEBUG; WARNING and above go to STDOUT, DEBUG logs to a file by @peterschmidt85 in dstackai/dstack#2940
- [Feature]: Support for Tenstorrent Galaxy by @peterschmidt85 in dstackai/dstack#2943
- Disallow duplicate project members by @r4victor in dstackai/dstack#2945
- [Feature]: If GCP logging or AWS Cloudwatch logging is configured, show link in the UI to the log stream by @olgenn in dstackai/dstack#2948
- Specify
sentry-sdk[fastapi]>=2.27.0
to fix missingSamplingContext
by @r4victor in dstackai/dstack#2950 - [Feature]: Showing timestamp for logs by @olgenn in dstackai/dstack#2937
- [Landing]: Highlight dstack Sky + CTA improvements by @peterschmidt85 in dstackai/dstack#2947
- Fix Lambda backend instance unreachable after dstack server restart by @Bihan in dstackai/dstack#2946
- Fix configuring CLI logging on Python 3.9/3.10 by @jvstme in dstackai/dstack#2953
- [Feature]: Add NVIDIA GPU passive health checks by @un-def in dstackai/dstack#2952
- Fix
_check_instance
log spam by @un-def in dstackai/dstack#2956 - Add more probe request configuration options by @jvstme in dstackai/dstack#2955
- [Feature]: Add Hot Aisle backend by @Bihan in dstackai/dstack#2935
- [Internal]: Fix release workflow by @jvstme in dstackai/dstack#2959
New Contributors
- @jspablo made their first contribution in dstackai/dstack#2941
Full Changelog: dstackai/dstack@0.19.21...0.19.22
0.19.17-v1
Single Sign-On via Google
dstack
Enterprise now supports Single Sign-On via Google. When Google integration is configured, the dstack
login page will display the Sign in with Google button. See the Google integration guide for more information.
Secrets
dstack
gets support for secrets that allow centralized management of sensitive values such as API keys and credentials. They are project-scoped, managed by project admins, and can be referenced in run configurations to pass sensitive values to runs in a secure manner. Example:
$ dstack secret set my_secret some_secret_value
OK
type: task
nodes: 1
name: test-secrets
env:
- MY_SECRET=${{ secrets.my_secret }}
commands:
- echo $MY_SECRET
$ dstack apply -f .dstack/confs/task.dstack.yaml
Submit the run test-task? [y/n]: y
NAME BACKEND RESOURCES PRICE STATUS SUBMITTED
test-task aws cpu=2 mem=8GB $0.107 running 10:48
(eu-west-1) disk=100GB
test-secrets provisioning completed (running)
some_secret_value
Exited (0)
For more details on secrets, check out the docs.
Files
By default, dstack
automatically mounts the repo directory where you ran dstack init
to any run configuration.
However, in some cases, you may not want to mount the entire directory (e.g., if it’s too large), or you might want to mount files outside of it. In such cases, you can use the files
property.
type: task
name: trl-sft
files:
- .:examples # Maps the directory where `.dstack.yml` to `/workflow/examples`
- ~/.ssh/id_rsa # Maps `~/.ssh/id_rsa` to `/root/.ssh/id_rsa`
python: 3.12
env:
- HF_TOKEN
- HF_HUB_ENABLE_HF_TRANSFER=1
- MODEL=Qwen/Qwen2.5-0.5B
- DATASET=stanfordnlp/imdb
commands:
- uv pip install trl
- |
trl sft \
--model_name_or_path $MODEL --dataset_name $DATASET
--num_processes $DSTACK_GPUS_PER_NODE
resources:
gpu: H100:1
Warning
If you have existing fleets, it's recommended to re-create them after upgrading to version 0.19.17. Otherwise, there is a risk that these instances won't be able to execute jobs if if a run uses files
.
Services
Rolling deployment
Rolling deployments introduced in 0.19.15 are now supported when deploying new commits or branches from a Git repo, or when changes are made to the repo contents or files listed in the files
section.
Additionally, dstack apply
now displays a full list of detected changes:
$ dstack apply -f my-service.dstack.yml
Active run my-service already exists. Detected changes that can be updated in-place:
- Repo state (branch, commit, or other)
- File archives
- Configuration properties:
- env
- files
Update the run? [y/n]:
Even when a rolling deployment isn't possible, the list of changes is still shown — making it easier to identify which changes are preventing the deployment from proceeding in-place.
What's changed
- [Bug]: Docker In Docker does not work with AMD by @peterschmidt85 in dstackai/dstack#2849
- [Feature] Add
files
property to run configurations by @un-def in dstackai/dstack#2848 - [Feature] Implement project secrets by @r4victor in dstackai/dstack#2854
- [Internal] Support fleet configurations for the local backend by @jvstme in dstackai/dstack#2856
- [Services] Rolling deployments for repo updates by @jvstme in dstackai/dstack#2853
- [Internal] Fix package dependency direction by @jvstme in dstackai/dstack#2859
- [Internal] Rolling deployments for
files
by @jvstme in dstackai/dstack#2862 - [Internal] Support the local backend with the in-server proxy by @jvstme in dstackai/dstack#2858
- [Docs] Added
Files
documentation by @peterschmidt85 in dstackai/dstack#2866 - [Bug] Fix
~
expansion infiles
by @un-def in dstackai/dstack#2865 - [Feature] Allow in-place update for more run properties by @jvstme in dstackai/dstack#2867
Full changelog: dstackai/dstack@0.19.16...0.19.17
0.19.21-v1
Runs
Scheduled runs
Runs get a new schedule
property that allows starting runs periodically by specifying a cron expression:
type: task
nodes: 1
schedule:
cron: "*/15 * * * *"
commands:
- ...
dstack
will start a scheduled run at cron
times unless the run is already running. It can then be stopped manually to prevent it from starting again. Learn more about scheduled runs in the docs.
CLI
Startup time
The CLI startup time was significantly improved up to 4 times by optimizing Python imports.
Server
Optimized DB queries
We optimized DB queries issues by the dstack
server. This improves API response times and decreases the load on the DB, which was previously noticeable on small Postgres instances.
What's Changed
- Support scheduled runs by @r4victor in dstackai/dstack#2914
- Autoset UTC timezone for datetimes loaded from the db by @r4victor in dstackai/dstack#2922
- Refactor backends module to avoid importing deps on models import by @r4victor in dstackai/dstack#2923
- Optimize db queries by @r4victor in dstackai/dstack#2928
- Optimize db queries (part 2) by @r4victor in dstackai/dstack#2929
- [UI] Add justfile to build frontend by @peterschmidt85 in dstackai/dstack#2897
- Fix project loading in _check_instance() by @r4victor in dstackai/dstack#2931
- Set up background tasks Sentry tracing by @r4victor in dstackai/dstack#2932
Full Changelog: dstackai/dstack@0.19.20...0.19.21
0.19.20-v1
User interface
Logs
This is a hotfix release addressing three major issues related to the UI:
- The UI didn’t display newer AWS CloudWatch logs if there was a long gap between old and new logs.
- Logs received before the 19th appeared as base64-encoded in the UI. The UI now includes a button to decode them automatically.
- Logs were loaded from start to end, which made viewing very slow for long runs.
Note
The dstack logs CLI command may still be affected by the issues above. However, it’s less critical and will be addressed separately.
What's changed
- [chore]: Drop duplicate utility
split_chunks
by @jvstme in dstackai/dstack#2912 - [backends/CloudRift] Fixed issue with terminating inactive instance by @6erun in dstackai/dstack#2918
- Expose GPU metrics collected by runner as Prometheus metrics by @un-def in dstackai/dstack#2916
- [UI] Query logs using descending by @peterschmidt85 in dstackai/dstack#2915
- [UI] Fix logs loading #2892 by @olgenn in dstackai/dstack#2920
Full changelog: dstackai/dstack@0.19.19...0.19.20
0.19.19-v1
Fleets
SSH fleets in-place updates
You can now add and remove instances in SSH fleets without recreating the entire fleet.
type: fleet
name: ssh-fleet
ssh_config:
user: dstack
identity_file: ~/.ssh/dstack
hosts:
- 10.0.0.1
- 10.0.0.2
$ dstack apply -f fleet.dstack.yml
...
Fleet ssh-fleet does not exist yet.
Create the fleet? [y/n]: y
...
FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED
ssh-fleet 0 ssh (remote) cpu=4 mem=4GB disk=30GB $0 idle 09:08
1 ssh (remote) cpu=2 mem=4GB disk=30GB $0 idle 09:08
Then, if you update the hosts
configuration property to
hosts:
#- 10.0.0.1 # removed
- 10.0.0.2
- 10.0.0.3 # added
and apply the same configuration again, the fleet will be updated in-place, meaning that you don't need to stop runs on the fleet instances if they are not affected by the changes (in this example, it's okay if the instance 1
is currenty busy, you can still apply the configuration).
$ dstack apply -f fleet.dstack.yml
...
Found fleet ssh-fleet. Configuration changes detected.
Update the fleet in-place? [y/n]: y
...
FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED
ssh-fleet 1 ssh (remote) cpu=2 mem=4GB disk=30GB $0 idle 09:08
2 ssh (remote) cpu=8 mem=4GB disk=30GB $0 idle 09:12
Note
For in-place updates it's only allowed to add and/or remove instances, the root configuration and configurations of hosts that are not changed must not be changed, otherwise the full fleet recreation is triggered, as before. This restriction may be lifted in the future.
Volumes
Automatic cleanup of unused volumes
The volume configuration gets a new auto_cleanup_duration
property:
type: volume
name: my-volume
backend: aws
region: eu-west-1
availability_zone: eu-west-1a
auto_cleanup_duration: 1h
The volume will be automatically deleted after it's not being used for the specified duration.
Logs
Browsable, queryable, and searchable logs
dstack
now stores run logs in plaintext, which were previously base64-encoded. This allows you to use the configured log storage, be it AWS CloudWatch or GCP Logging, to browse and query dstack
run logs.
Note
Logs generated before this release will be shown as base64-encoded in the UI and CLI after the update.
Server
Faster API response times
The dstack
server API has been optimized to serialize json responses faster. The API endpoints are up to 2x faster than before.
Benchmarks
Benchmarking AMD GPUs: bare-metal, containers, partitions
Our new benchmark explores two important areas for optimizing AI workloads on AMD GPUs: First, do containers introduce a performance penalty for network-intensive tasks compared to a bare-metal setup? Second, how does partitioning a powerful GPU like the MI300X affect its real-world performance for different types of AI workloads?
What's Changed
- [Internal] Some runner tests fail on macOS by @peterschmidt85 in dstackai/dstack#2879
- Introduce job_submissions_limit for /api/runs/list by @r4victor in dstackai/dstack#2883
- Speed up json serialization with orjson and custom FastAPI responses by @r4victor in dstackai/dstack#2880
- [Docs]: Service rolling deployments by @jvstme in dstackai/dstack#2870
- Do not lose
provisioning
gateways on restart by @jvstme in dstackai/dstack#2887 - Add/remove SSH instances via in-place update by @un-def in dstackai/dstack#2884
- [Docs]: Add example of setting a PostgreSQL URL by @jvstme in dstackai/dstack#2888
- [Blog] Added new changelog by @peterschmidt85 in dstackai/dstack#2891
- Fix job_submissions_limit backward compatibility by @r4victor in dstackai/dstack#2894
- Fix run and job status_message calculation by @r4victor in dstackai/dstack#2889
- Fix 500 errors when requesting file logs by @r4victor in dstackai/dstack#2896
- Rolling deployments for
port
by @jvstme in dstackai/dstack#2893 - [Feature] Strip ANSI codes from run logs and store them as plain text instead of bytes by @peterschmidt85 in dstackai/dstack#2876
- [Feature]: Add ability to disable background processing and only run Web UI and API server #2901 by @james-boydell in dstackai/dstack#2902
- [shim] Don't check image downloaded size by @un-def in dstackai/dstack#2903
- Fix rolling deployment migration locking by @r4victor in dstackai/dstack#2904
- feat: add volume idle duration cleanup feature (#2497) by @haydnli-shopify in dstackai/dstack#2842
- [Blog] Benchmarking AMD GPUs: bare-metal, containers, partitions by @peterschmidt85 in dstackai/dstack#2905
- Fix /users/list by @r4victor in dstackai/dstack#2908
- Return logs in base64 for backward compatibility by @r4victor in dstackai/dstack#2910
Full Changelog: dstackai/dstack@0.19.18...0.19.19
0.19.18-v1
Server
Optimized resources processing
This release includes major improvements that allow the dstack server process more resources quickly. It also allows scaling processing rates of one server replica to take advantage of big Postgres instances by setting the DSTACK_SERVER_BACKGROUND_PROCESSING_FACTOR
environment variable.
The result is:
- Faster processing rates: provisioning 100 runs on SQLite with default settings went from ~5m to ~2m.
- Better scaling: provisioning additional 100 runs is even quicker due to warm cache. Before, it was slower than the first 100 runs.
- Ability to process more runs per server replica: provisioning 300 runs on Postgres with
DSTACK_SERVER_BACKGROUND_PROCESSING_FACTOR=4
is ~4m.
For more details on scaling backgraound processing rates, see the Server deployment guide.
Backends
Private GCP gateways
It's now possible to create GCP gateways without public IPs:
type: gateway
name: example
domain: gateway.example.com
backend: gcp
region: europe-west9
public_ip: false
certificate: null
Note that configuring HTTPS certificates for private GCP gateways is not yet supported, so you need to specify certificate: null
.
What's Changed
- Ignore SSH keys when calculating fleet conf diff by @un-def in dstackai/dstack#2869
- [Blog] Refactoring by @peterschmidt85 in dstackai/dstack#2873
- Implemented fronted precommit linting by @olgenn in dstackai/dstack#2868
- Support processing more resources per replica by @r4victor in dstackai/dstack#2871
- Use uvloop by default by @r4victor in dstackai/dstack#2874
- Add server profiling by @r4victor in dstackai/dstack#2875
- Fix NVIDIA container toolkit bug in all backends by @jvstme in dstackai/dstack#2877
- Private GCP gateways by @jvstme in dstackai/dstack#2881
- Switch to
e2-medium
for GCP gateways by @jvstme in dstackai/dstack#2886
Full Changelog: dstackai/dstack@0.19.17...0.19.18
0.19.16-v1
Docker
Docker in Docker
Using Docker in a run configuration is now much easier. Just set docker
to true
:
type: task
name: docker-nvidia-smi
docker: true
commands:
- docker run --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
resources:
gpu: 1
This works with all run configuration types and supports both AMD and NVIDIA GPUs. It’s especially useful if you want to use the docker CLI in your commands—for example, to build Docker images.
The docker
property is supported on all backends except vastai, runpod, and kubernetes, and is fully supported on SSH fleets as well.
Backends
CloudRift
The CloudRift team has added support for their GPU cloud, which can now be used with dstack
.
To configure it, use a CloudRift API key in the backend configuration:
projects:
- name: main
backends:
- type: cloudrift
creds:
type: api_key
api_key: rift_2prgY1d0laOrf2BblTwx2B2d1zcf1zIp4tZYpj5j88qmNgz38pxNlpX3vAo
CloudRift offers competitive on-demand GPU pricing, with more GPUs and regions coming soon.
dstack apply -f examples/.dstack.yml -b cloudrift
# BACKEND RESOURCES INSTANCE TYPE PRICE
1 cloudrift (us-east-nc-nr-1) cpu=16 mem=100GB disk=1000GB RTX5090:32GB:1 rtx59-16c-nr.1 $0.65
If you encounter any issues with this backend, please report them.
Server
Public projects
You can now create public projects that any user on the server can join or leave without approval. Previously, all projects were private, and adding new members required manual action by an admin or manager—a step that’s redundant in high-trust environments.
Admins can change a project’s visibility at any time in the project settings.
Metrics
The server exports new Prometheus metrics:
dstack_submit_to_provision_duration_seconds
: Time from when a run has been submitted and first job provisioningdstack_pending_runs_total
: Total number of pending runs
What's changed
- [Feature]: Property filter on Fleets, Models, Volumes pages by @olgenn in dstackai/dstack#2824
- [Bug]: Run/job status in UI/CLI is shown as
provisioning
instead ofpulling
by @peterschmidt85 in dstackai/dstack#2834 - [chore]: Fix annotation in
update_service_desired_replica_count
by @jvstme in dstackai/dstack#2840 - Add CloudRift backend by @6erun in dstackai/dstack#2771
- Fix Postgres deadlocks by @r4victor in dstackai/dstack#2843
- [UX] Simplify the use of Docker inside containers #2468 by @peterschmidt85 in dstackai/dstack#2828
- [Docs] Update docs and examples to reflect the
docker
property by @peterschmidt85 in dstackai/dstack#2831 - Add support for Tenstorrent n300 GPUs by @peterschmidt85 in dstackai/dstack#2827
- [Feature]: Property filter on Instances page by @olgenn in dstackai/dstack#2826
- [UI] Allow to hide the Tour panel by @olgenn in dstackai/dstack#2816
- Pr3 add join leave UI buttons by @haydnli-shopify in dstackai/dstack#2795
- Health metrics (Part 2) by @Nadine-H in dstackai/dstack#2796
- [Bug]: Use a unique token for log pagination instead of a timestamp by @peterschmidt85 in dstackai/dstack#2845
- Fix update project required permissions by @r4victor in dstackai/dstack#2846
New contributors
- @6erun made their first contribution in dstackai/dstack#2771
Full changelog: dstackai/dstack@0.19.15...0.19.16
0.19.15-v1
Services
Rolling deployments
This update introduces rolling deployments, which help avoid downtime when deploying new versions of your services.
When you apply an updated service configuration, dstack
will gradually replace old service replicas with new ones. You can track the progress in the dstack apply
output — the deployment
number will be lower for old replicas and higher for new ones.
> dstack apply -f my-service.dstack.yml
Active run my-service already exists. Detected configuration changes that can be updated in-place: ['image', 'env', 'commands']
Update the run? [y/n]: y
⠋ Launching my-service...
NAME BACKEND RESOURCES PRICE STATUS SUBMITTED
my-service deployment=1 running 11 mins ago
replica=0 job=0 deployment=0 aws (us-west-2) cpu=2 mem=1GB disk=100GB (spot) $0.0026 terminating 11 mins ago
replica=1 job=0 deployment=1 aws (us-west-2) cpu=2 mem=1GB disk=100GB (spot) $0.0026 running 1 min ago
Currently, the following service configuration properties can be updated using rolling deployments: resources
, volumes
, image
, user
, privileged
, entrypoint
, python
, nvcc
, single_branch
, env
, shell
, and commands
.
Future releases will allow updating more properties and deploying new git repo commits.
Clusters
Updated default Docker images
If you don't specify a custom image
in the run configuration, dstack
uses its default images. These images have been improved for cluster environments and now include mpirun
and NCCL tests. Additionally, if you are running on AWS EFA-capable instances, dstack
will now automatically select an image with the appropriate EFA drivers. See our new AWS EFA guide for more details.
Server
Health metrics
The dstack
server now exports some operational Prometheus metrics that allow to monitor its health. If you are running your own production-grade dstack
server installation, refer to the metrics docs for details.
What's changed
- Set logsWaitDuration to 5m by @r4victor in dstackai/dstack#2794
- Add health metrics (Part 1) by @Nadine-H in dstackai/dstack#2760
- Add public projects by @haydnli-shopify in dstackai/dstack#2759
- Fix is_public allowing null by @r4victor in dstackai/dstack#2798
- Retry on
VOLUME_ERROR
andINSTANCE_UNREACHABLE
by @jvstme in dstackai/dstack#2805 - Rework default Docker images by @peterschmidt85 in dstackai/dstack#2799
- Fix volume error status message by @jvstme in dstackai/dstack#2806
- [Docs] Added EFA example by @peterschmidt85 in dstackai/dstack#2820
- [Bug]: Empty spaces on User Details page by @olgenn in dstackai/dstack#2815
- Rolling deployment for services by @jvstme in dstackai/dstack#2821
- Fix building
dstack
package by @jvstme in dstackai/dstack#2823
New Contributors
- @haydnli-shopify made their first contribution in dstackai/dstack#2759
Full Changelog: dstackai/dstack@0.19.13...0.19.15