Fix: Lambda backend instance unreachable after dstack server restart #2946
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes issue #2669, where the shim launched via SSH gets terminated when the dstack server restarts.
Issue Cause:
dstack's local daemon thread creates an SSH connection to the VM. The shim installation command runs on the VM via this SSH connection, unlike cloud-init setup.Even though the shim runs on the VM, it's still a child of the SSH session.When dstack server restarts → daemon thread dies → SSH connection closes.When SSH session closes, the remote shell session ends, and any processes started by that session (including the shim) get terminated.
Fix:
The shim launch_command is daemonized as daemonized_command = f"{launch_command.rstrip('&')} >/tmp/dstack-shim.log 2>&1 & disown"