-
Notifications
You must be signed in to change notification settings - Fork 185
Closed
Labels
Description
Steps to reproduce
- Create an instance using the
lambda
backend, wait until it becomesidle
orbusy
. - Restart
dstack server
Actual behaviour
The instance becomes unreachable and never recovers. If it was running a job, the job is terminated. dstack-shim
no longer runs on the instance.
The first shim health check attempt fails with this error:
DEBUG dstack._internal.server.background.tasks.process_instances:747 Check instance cloud-0 status. shim health: Can't request shim:
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
All other health checks fail with this error:
DEBUG dstack._internal.server.background.tasks.process_instances:747 Check instance cloud-0 status. shim health: Can't request shim:
('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
Expected behaviour
The instance remains idle
or busy
.
dstack version
master
Server logs
Additional information
No response