Skip to content

Add WSLAContainer flag to auto delete container after exit#14196

Open
kvega005 wants to merge 6 commits intomicrosoft:feature/wsl-for-appsfrom
kvega005:user/kevinve/auto-delete-containers
Open

Add WSLAContainer flag to auto delete container after exit#14196
kvega005 wants to merge 6 commits intomicrosoft:feature/wsl-for-appsfrom
kvega005:user/kevinve/auto-delete-containers

Conversation

@kvega005
Copy link

Summary of the Pull Request

Add support for Docker-style auto-remove (Rm) flag for containers

  • Persist and restore the WSLAContainerFlagsRm flag in container metadata.
  • Automatically delete containers with the Rm flag after they stop, using a detached thread.
  • Add tests to verify auto-remove behavior and flag persistence across sessions.

PR Checklist

  • Closes: Link to issue #xxx
  • Communication: I've discussed this with core contributors already. If work hasn't been agreed, this work might be rejected
  • Tests: Added/updated if needed and all pass
  • Localization: All end user facing strings can be localized
  • Dev docs: Added/updated if needed
  • Documentation updated: If checked, please file a pull request on our docs repo and link it here: #xxx

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

@kvega005 kvega005 marked this pull request as ready for review February 11, 2026 19:00
@kvega005 kvega005 requested a review from a team as a code owner February 11, 2026 19:00
Copilot AI review requested due to automatic review settings February 11, 2026 19:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Docker-style auto-remove behavior for WSLA containers by persisting an Rm-style flag in container metadata and triggering container deletion automatically after the container stops.

Changes:

  • Persist/restore WSLAContainerFlags in WSLAContainerMetadataV1 so flags survive service/session restarts.
  • Implement auto-delete on container stop events when WSLAContainerFlagsRm is set.
  • Add a new Windows test covering auto-remove behavior and flag persistence across sessions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
test/windows/WSLATests.cpp Adds ContainerAutoRemove test validating async deletion and persistence across sessions.
src/windows/wslaservice/exe/WSLAContainerMetadata.h Extends persisted container metadata to include Flags.
src/windows/wslaservice/exe/WSLAContainer.cpp Persists flags into metadata on create/open and triggers auto-delete on stop events.

Comment on lines +3553 to +3562
// A container with the Rm flag is deleted asynchronously after it's stopped.
// This retry loop verifies the container is deleted within one second.
wsl::shared::retry::RetryWithTimeout<void>(
[&]() {
wil::com_ptr<IWSLAContainer> container;
auto hr = m_defaultSession->OpenContainer(containerName, &container);
THROW_HR_IF(E_FAIL, hr != HRESULT_FROM_WIN32(ERROR_NOT_FOUND));
},
std::chrono::milliseconds{100},
std::chrono::seconds{1});
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyContainerDeleted only waits up to 1 second for async removal. Container deletion can legitimately take longer on busy/slow test machines (Docker latency, service scheduling), making this test flaky. Consider using a longer timeout consistent with other container/process waits in this file (e.g., 30s) while keeping the poll interval small.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree I'd recommend increasing the timeout to a minute so it doesn't risk failing in CI (which can pretty slow sometimes)

}

// Unmount volumes.
UnmountVolumes(m_mountedVolumes, *m_parentVM);
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WSLAContainerImpl::Delete() already calls UnmountVolumes(m_mountedVolumes, *m_parentVM). Adding an unconditional UnmountVolumes(...) in the destructor will attempt to unmount the same paths again after a successful delete, which can generate noisy LOG_IF_FAILED telemetry (and may fail if the mount is already gone). Consider making unmount idempotent by clearing m_mountedVolumes after a successful unmount, or only unmounting in one place (Delete vs destructor) guarded by state.

Suggested change
UnmountVolumes(m_mountedVolumes, *m_parentVM);
UnmountVolumes(m_mountedVolumes, *m_parentVM);
m_mountedVolumes.clear();

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with this. Do we need to explicitly unmount in the destructor ? We should only need to do it when it's deleted, which Delete() already does

Copy link
Author

@kvega005 kvega005 Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not. this is a remnant of when I was trying to use the docker auto-remove flag. In that case the container could be deleted without going through Delete(), but I ultimately decided against that approach for a few different reasons. I just forgot to remove this.

Comment on lines 502 to 508
// If the Rm flag is set, delete the container.
// This needs to be done in a detached thread because it will trigger deletion of this object.
if (WI_IsFlagSet(m_containerFlags, WSLAContainerFlagsRm))
{
auto comWrapper = m_comWrapper;
std::thread([comWrapper]() { LOG_IF_FAILED(comWrapper->Delete()); }).detach();
}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-remove spawns a new detached std::thread per container stop. This can become an unbounded thread creation path under load and there's no shutdown coordination/join, which is risky for a service. Prefer queueing the delete onto an existing worker mechanism (e.g., Windows threadpool work item / a shared service work queue) instead of creating detached threads.

Copilot uses AI. Check for mistakes.
if (WI_IsFlagSet(m_containerFlags, WSLAContainerFlagsRm))
{
auto comWrapper = m_comWrapper;
std::thread([comWrapper]() { LOG_IF_FAILED(comWrapper->Delete()); }).detach();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely tricky because we're indeed inside the object that we want to delete.

Here's an idea that might help simplify this:

Instead of registering the event callback in WSLAContainerImpl, we register the callback at the WSLAContainer level, and then we have something like this:

void WSLAContainerImpl::OnEvent(ContainerEvent event, std::optional<int> exitCode)
{
        if (event == ContainerEvent::Stop)
        {
              auto [lock, impl] = LockImpl();

             impl->OnStopped();

             if (WI_IsFlagSet(impl->Flags(), WSLAContainerFlagsRm))
             {
                  impl->Delete();
                  m_onDeleted(impl);
             }
        }
}

We'd have to add a OnStopped() & Flags() methods, but that should allow us to completely avoid the "I need to delete myself" problem

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to do a bit of refactoring to avoid lock ordering issues and potential deadlocks. The OnDeleted callback now returns ownership of the WSLAContainerImpl. This needs to be destroyed outside of the WSLAContainer lock.

TraceLoggingValue(impl->ID().c_str(), "Id"),
TraceLoggingValue((int)event, "Event"));

if (event == ContainerEvent::Stop)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd recommend doing an moving the logging and this check in the beginning of this method so we can skip acquiring the lock if this isn't a stop event

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this last night. Right now we need to acquire the lock for logging as well. But we can store the name and ID in the com wrapper, so we don't need to call out to the implementation for logging.

}

// Unmount volumes.
UnmountVolumes(m_mountedVolumes, *m_parentVM);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with this. Do we need to explicitly unmount in the destructor ? We should only need to do it when it's deleted, which Delete() already does

Comment on lines +3553 to +3562
// A container with the Rm flag is deleted asynchronously after it's stopped.
// This retry loop verifies the container is deleted within one second.
wsl::shared::retry::RetryWithTimeout<void>(
[&]() {
wil::com_ptr<IWSLAContainer> container;
auto hr = m_defaultSession->OpenContainer(containerName, &container);
THROW_HR_IF(E_FAIL, hr != HRESULT_FROM_WIN32(ERROR_NOT_FOUND));
},
std::chrono::milliseconds{100},
std::chrono::seconds{1});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree I'd recommend increasing the timeout to a minute so it doesn't risk failing in CI (which can pretty slow sometimes)

auto process = container.GetInitProcess();

VERIFY_ARE_EQUAL(container.State(), WslaContainerStateRunning);
VERIFY_SUCCEEDED(container.Get().Stop(WSLASignalSIGKILL, 0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be nice to have special logic in WSLAContainer() to synchronously perform the container deleting if the rm flag is set, that way the caller is guaranteed that the container is deleted when Stop() returns successfully (feel free to do this in a future change though)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about that earlier and ran into the problem of impl deleting itself, but now that we are implementing some of this in the com wrapper, I can revisit.

ResetTestSession();
}

auto container = OpenContainer(m_defaultSession.get(), "test-auto-remove");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to add the container flags to ContainerInfo for the flags to be correctly re-populated when we recover containers from storage

Copy link
Author

@kvega005 kvega005 Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store the process and container flags in the container metadata. I figured in the future we might have more info in these flags that would be important to recover but are not necessarily important to docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants