Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup for shared container scratch #1414

Merged
merged 3 commits into from
May 27, 2022

Conversation

ambarve
Copy link
Contributor

@ambarve ambarve commented May 26, 2022

Normally, container scratch is a separate VHD that is mounted at /run/gcs/c/<CID> The
mount vhd call to gcs creates this directory and mounts the disk at that path. The modify
combined layers call creates the upper & work directories at /run/gcs/c/<CID>/upper
& /run/gcs/c/<CID>/work and also creates the rootfs directory at
/run/gcs/c/<CID>/rootfs. Finally, overlayfs is mounted at /run/gcs/c/<CID>/rootfs for
the container to run. The remove combined layers call will remove the
/run/gcs/c/<CID>/rootfs directory. The unmount VHD will remove the /run/gcs/c/<CID>
directory. The upper & work directories continue to live on the container sandbox VHD
until that snapshot is removed.

However, when the container scratch VHD is shared with the UVM (the UVM scratch is mounted
at /run/gcs/c/) the container scratch vhd mount doesn't do anything since the VHD
is already mounted. It just increases the ref count of this VHD by 1. Hcsshim sets the
container scratch as /run/gcs/c/<UVMID>/container_<CID> and makes the combine layers
call. The modify combine layers call creates the upper & work directories at
/run/gcs/c/<UMVID>/container_<CID>/uppper, /run/gcs/c/<UMVID>/container_<CID>/work and
the rootfs is created at /run/gcs/c/<CID>/rootfs. (Note that all of these are mkdirAll
calls so they create any non-existing parent directories too). The remove combined layers
call removes the /run/gcs/c/<CID>/rootfs directory but the unmount VHD call doesn't
actually do anything, it just reduces the ref count by 1. So, in this case, the
/run/gcs/c/<UVMID>/container_<CID> directory never gets cleaned up and continues to
occupy space on the UVM scratch VHD.

GCS currently doesn't know anything about the container scratch directory path, so it
never cleans it up. Even when DeleteContainerState request is made. This is okay when
the scratch is not shared since the container scratch VHD is separate and it will be
cleaned up when containerd removes the container snapshot. However, in the shared scratch
case these scratch directories are leaked.

To correctly handle this case, we need to cleanup the container scratch directory during
the DeleteTask call. This commit updates the container creation request doc to also
include the container scratch directory path so that gcs has this path when the
DeleteContainerState request is made to the GCS. Also, previously we called
DeleteContainerState during ReleaseResources call. However, ReleaseResources is
called during KillTask/ShutdownTask request and shouldn't delete any resources that
belong to the container so that call is removed.

Signed-off-by: Amit Barve ambarve@microsoft.com

Normally, container scratch is a separate VHD that is mounted at `/run/gcs/c/<CID>` The
mount vhd call to gcs creates this directory and mounts the disk at that path.  The modify
combined layers call creates the `upper` & `work` directories at `/run/gcs/c/<CID>/upper`
& `/run/gcs/c/<CID>/work` and also creates the rootfs directory at
`/run/gcs/c/<CID>/rootfs`. Finally, overlayfs is mounted at `/run/gcs/c/<CID>/rootfs` for
the container to run. The remove combined layers call will remove the
`/run/gcs/c/<CID>/rootfs` directory.  The unmount VHD will remove the `/run/gcs/c/<CID>`
directory. The `upper` & `work` directories continue to live on the container sandbox VHD
until that snapshot is removed.

However, when the container scratch VHD is shared with the UVM (the UVM scratch is mounted
at /run/gcs/c/<UVMID>) the container scratch vhd mount doesn't do anything since the VHD
is already mounted. It just increases the ref count of this VHD by 1. Hcsshim sets the
container scratch as `/run/gcs/c/<UVMID>/container_<CID>` and makes the combine layers
call.  The modify combine layers call creates the `upper` & `work` directories at
`/run/gcs/c/<UMVID>/container_<CID>/uppper`, `/run/gcs/c/<UMVID>/container_<CID>/work` and
the rootfs is created at `/run/gcs/c/<CID>/rootfs`. (Note that all of these are mkdirAll
calls so they create any non-existing parent directories too).  The remove combined layers
call removes the `/run/gcs/c/<CID>/rootfs` directory but the unmount VHD call doesn't
actually do anything, it just reduces the ref count by 1.  So, in this case, the
`/run/gcs/c/<UVMID>/container_<CID>` directory never gets cleaned up and continues to
occupy space on the UVM scratch VHD.

GCS currently doesn't know anything about the container scratch directory path, so it
never cleans it up. Even when `DeleteContainerState` request is made. This is okay when
the scratch is not shared since the container scratch VHD is separate and it will be
cleaned up when containerd removes the container snapshot. However, in the shared scratch
case these scratch directories are leaked.

To correctly handle this case, we need to cleanup the container scratch directory during
the `DeleteTask` call. This commit updates the container creation request doc to also
include the container scratch directory path so that gcs has this path when the
`DeleteContainerState` request is made to the GCS. Also, previously we called
`DeleteContainerState` during `ReleaseResources` call. However, `ReleaseResources` is
called during `KillTask`/`ShutdownTask` request and shouldn't delete any resources that
belong to the container so that call is removed.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
@ambarve ambarve requested a review from a team as a code owner May 26, 2022 19:17
Signed-off-by: Amit Barve <ambarve@microsoft.com>
internal/layers/layers.go Outdated Show resolved Hide resolved
@dcantah
Copy link
Contributor

dcantah commented May 26, 2022

Could we have a test for this? e.g. create a pod that shares scratch and:

  1. Create cntr
  2. Stop cntr
  3. Exec in VM and check if the upper dir is there.

Might be kinda ugly because the upper dir path is kind of an implementation detail. Curious if we could find a good test here

@ambarve
Copy link
Contributor Author

ambarve commented May 26, 2022

@dcantah Sharing scratch is only controlled by providing that option in the containerd config. So currently our cri tests can't test this scenario but I will try to think of a way to control containerd config in our tests.

check for nil UVM handle, undo export for GetSCratchVHDPath etc.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
Copy link
Contributor

@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ambarve ambarve merged commit a750a7f into microsoft:master May 27, 2022
@ambarve ambarve deleted the scratch_dir_cleanup branch August 9, 2022 04:05
princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
* Cleanup for shared container scratch

Normally, container scratch is a separate VHD that is mounted at `/run/gcs/c/<CID>` The
mount vhd call to gcs creates this directory and mounts the disk at that path.  The modify
combined layers call creates the `upper` & `work` directories at `/run/gcs/c/<CID>/upper`
& `/run/gcs/c/<CID>/work` and also creates the rootfs directory at
`/run/gcs/c/<CID>/rootfs`. Finally, overlayfs is mounted at `/run/gcs/c/<CID>/rootfs` for
the container to run. The remove combined layers call will remove the
`/run/gcs/c/<CID>/rootfs` directory.  The unmount VHD will remove the `/run/gcs/c/<CID>`
directory. The `upper` & `work` directories continue to live on the container sandbox VHD
until that snapshot is removed.

However, when the container scratch VHD is shared with the UVM (the UVM scratch is mounted
at /run/gcs/c/<UVMID>) the container scratch vhd mount doesn't do anything since the VHD
is already mounted. It just increases the ref count of this VHD by 1. Hcsshim sets the
container scratch as `/run/gcs/c/<UVMID>/container_<CID>` and makes the combine layers
call.  The modify combine layers call creates the `upper` & `work` directories at
`/run/gcs/c/<UMVID>/container_<CID>/uppper`, `/run/gcs/c/<UMVID>/container_<CID>/work` and
the rootfs is created at `/run/gcs/c/<CID>/rootfs`. (Note that all of these are mkdirAll
calls so they create any non-existing parent directories too).  The remove combined layers
call removes the `/run/gcs/c/<CID>/rootfs` directory but the unmount VHD call doesn't
actually do anything, it just reduces the ref count by 1.  So, in this case, the
`/run/gcs/c/<UVMID>/container_<CID>` directory never gets cleaned up and continues to
occupy space on the UVM scratch VHD.

GCS currently doesn't know anything about the container scratch directory path, so it
never cleans it up. Even when `DeleteContainerState` request is made. This is okay when
the scratch is not shared since the container scratch VHD is separate and it will be
cleaned up when containerd removes the container snapshot. However, in the shared scratch
case these scratch directories are leaked.

To correctly handle this case, we need to cleanup the container scratch directory during
the `DeleteTask` call. This commit updates the container creation request doc to also
include the container scratch directory path so that gcs has this path when the
`DeleteContainerState` request is made to the GCS. Also, previously we called
`DeleteContainerState` during `ReleaseResources` call. However, `ReleaseResources` is
called during `KillTask`/`ShutdownTask` request and shouldn't delete any resources that
belong to the container so that call is removed.


Signed-off-by: Amit Barve <ambarve@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants