Skip to content

Fix: Lambda backend instance unreachable after dstack server restart #2946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Bihan
Copy link
Collaborator

@Bihan Bihan commented Aug 5, 2025

This PR fixes issue #2669, where the shim launched via SSH gets terminated when the dstack server restarts.

Issue Cause:
dstack's local daemon thread creates an SSH connection to the VM. The shim installation command runs on the VM via this SSH connection, unlike cloud-init setup.Even though the shim runs on the VM, it's still a child of the SSH session.When dstack server restarts → daemon thread dies → SSH connection closes.When SSH session closes, the remote shell session ends, and any processes started by that session (including the shim) get terminated.

Fix:
The shim launch_command is daemonized as daemonized_command = f"{launch_command.rstrip('&')} >/tmp/dstack-shim.log 2>&1 & disown"

@Bihan Bihan requested review from peterschmidt85 and jvstme August 5, 2025 12:15
@peterschmidt85 peterschmidt85 removed their request for review August 5, 2025 12:37
@Bihan Bihan merged commit e7616a3 into dstackai:master Aug 6, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants