-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup for shared container scratch #1414
Conversation
Normally, container scratch is a separate VHD that is mounted at `/run/gcs/c/<CID>` The mount vhd call to gcs creates this directory and mounts the disk at that path. The modify combined layers call creates the `upper` & `work` directories at `/run/gcs/c/<CID>/upper` & `/run/gcs/c/<CID>/work` and also creates the rootfs directory at `/run/gcs/c/<CID>/rootfs`. Finally, overlayfs is mounted at `/run/gcs/c/<CID>/rootfs` for the container to run. The remove combined layers call will remove the `/run/gcs/c/<CID>/rootfs` directory. The unmount VHD will remove the `/run/gcs/c/<CID>` directory. The `upper` & `work` directories continue to live on the container sandbox VHD until that snapshot is removed. However, when the container scratch VHD is shared with the UVM (the UVM scratch is mounted at /run/gcs/c/<UVMID>) the container scratch vhd mount doesn't do anything since the VHD is already mounted. It just increases the ref count of this VHD by 1. Hcsshim sets the container scratch as `/run/gcs/c/<UVMID>/container_<CID>` and makes the combine layers call. The modify combine layers call creates the `upper` & `work` directories at `/run/gcs/c/<UMVID>/container_<CID>/uppper`, `/run/gcs/c/<UMVID>/container_<CID>/work` and the rootfs is created at `/run/gcs/c/<CID>/rootfs`. (Note that all of these are mkdirAll calls so they create any non-existing parent directories too). The remove combined layers call removes the `/run/gcs/c/<CID>/rootfs` directory but the unmount VHD call doesn't actually do anything, it just reduces the ref count by 1. So, in this case, the `/run/gcs/c/<UVMID>/container_<CID>` directory never gets cleaned up and continues to occupy space on the UVM scratch VHD. GCS currently doesn't know anything about the container scratch directory path, so it never cleans it up. Even when `DeleteContainerState` request is made. This is okay when the scratch is not shared since the container scratch VHD is separate and it will be cleaned up when containerd removes the container snapshot. However, in the shared scratch case these scratch directories are leaked. To correctly handle this case, we need to cleanup the container scratch directory during the `DeleteTask` call. This commit updates the container creation request doc to also include the container scratch directory path so that gcs has this path when the `DeleteContainerState` request is made to the GCS. Also, previously we called `DeleteContainerState` during `ReleaseResources` call. However, `ReleaseResources` is called during `KillTask`/`ShutdownTask` request and shouldn't delete any resources that belong to the container so that call is removed. Signed-off-by: Amit Barve <ambarve@microsoft.com>
Signed-off-by: Amit Barve <ambarve@microsoft.com>
Could we have a test for this? e.g. create a pod that shares scratch and:
Might be kinda ugly because the upper dir path is kind of an implementation detail. Curious if we could find a good test here |
e87fdb0
to
0473166
Compare
@dcantah Sharing scratch is only controlled by providing that option in the containerd config. So currently our cri tests can't test this scenario but I will try to think of a way to control containerd config in our tests. |
0473166
to
9e589e2
Compare
check for nil UVM handle, undo export for GetSCratchVHDPath etc. Signed-off-by: Amit Barve <ambarve@microsoft.com>
9e589e2
to
a533f65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Cleanup for shared container scratch Normally, container scratch is a separate VHD that is mounted at `/run/gcs/c/<CID>` The mount vhd call to gcs creates this directory and mounts the disk at that path. The modify combined layers call creates the `upper` & `work` directories at `/run/gcs/c/<CID>/upper` & `/run/gcs/c/<CID>/work` and also creates the rootfs directory at `/run/gcs/c/<CID>/rootfs`. Finally, overlayfs is mounted at `/run/gcs/c/<CID>/rootfs` for the container to run. The remove combined layers call will remove the `/run/gcs/c/<CID>/rootfs` directory. The unmount VHD will remove the `/run/gcs/c/<CID>` directory. The `upper` & `work` directories continue to live on the container sandbox VHD until that snapshot is removed. However, when the container scratch VHD is shared with the UVM (the UVM scratch is mounted at /run/gcs/c/<UVMID>) the container scratch vhd mount doesn't do anything since the VHD is already mounted. It just increases the ref count of this VHD by 1. Hcsshim sets the container scratch as `/run/gcs/c/<UVMID>/container_<CID>` and makes the combine layers call. The modify combine layers call creates the `upper` & `work` directories at `/run/gcs/c/<UMVID>/container_<CID>/uppper`, `/run/gcs/c/<UMVID>/container_<CID>/work` and the rootfs is created at `/run/gcs/c/<CID>/rootfs`. (Note that all of these are mkdirAll calls so they create any non-existing parent directories too). The remove combined layers call removes the `/run/gcs/c/<CID>/rootfs` directory but the unmount VHD call doesn't actually do anything, it just reduces the ref count by 1. So, in this case, the `/run/gcs/c/<UVMID>/container_<CID>` directory never gets cleaned up and continues to occupy space on the UVM scratch VHD. GCS currently doesn't know anything about the container scratch directory path, so it never cleans it up. Even when `DeleteContainerState` request is made. This is okay when the scratch is not shared since the container scratch VHD is separate and it will be cleaned up when containerd removes the container snapshot. However, in the shared scratch case these scratch directories are leaked. To correctly handle this case, we need to cleanup the container scratch directory during the `DeleteTask` call. This commit updates the container creation request doc to also include the container scratch directory path so that gcs has this path when the `DeleteContainerState` request is made to the GCS. Also, previously we called `DeleteContainerState` during `ReleaseResources` call. However, `ReleaseResources` is called during `KillTask`/`ShutdownTask` request and shouldn't delete any resources that belong to the container so that call is removed. Signed-off-by: Amit Barve <ambarve@microsoft.com>
Normally, container scratch is a separate VHD that is mounted at
/run/gcs/c/<CID>
Themount vhd call to gcs creates this directory and mounts the disk at that path. The modify
combined layers call creates the
upper
&work
directories at/run/gcs/c/<CID>/upper
&
/run/gcs/c/<CID>/work
and also creates the rootfs directory at/run/gcs/c/<CID>/rootfs
. Finally, overlayfs is mounted at/run/gcs/c/<CID>/rootfs
forthe container to run. The remove combined layers call will remove the
/run/gcs/c/<CID>/rootfs
directory. The unmount VHD will remove the/run/gcs/c/<CID>
directory. The
upper
&work
directories continue to live on the container sandbox VHDuntil that snapshot is removed.
However, when the container scratch VHD is shared with the UVM (the UVM scratch is mounted
at /run/gcs/c/) the container scratch vhd mount doesn't do anything since the VHD
is already mounted. It just increases the ref count of this VHD by 1. Hcsshim sets the
container scratch as
/run/gcs/c/<UVMID>/container_<CID>
and makes the combine layerscall. The modify combine layers call creates the
upper
&work
directories at/run/gcs/c/<UMVID>/container_<CID>/uppper
,/run/gcs/c/<UMVID>/container_<CID>/work
andthe rootfs is created at
/run/gcs/c/<CID>/rootfs
. (Note that all of these are mkdirAllcalls so they create any non-existing parent directories too). The remove combined layers
call removes the
/run/gcs/c/<CID>/rootfs
directory but the unmount VHD call doesn'tactually do anything, it just reduces the ref count by 1. So, in this case, the
/run/gcs/c/<UVMID>/container_<CID>
directory never gets cleaned up and continues tooccupy space on the UVM scratch VHD.
GCS currently doesn't know anything about the container scratch directory path, so it
never cleans it up. Even when
DeleteContainerState
request is made. This is okay whenthe scratch is not shared since the container scratch VHD is separate and it will be
cleaned up when containerd removes the container snapshot. However, in the shared scratch
case these scratch directories are leaked.
To correctly handle this case, we need to cleanup the container scratch directory during
the
DeleteTask
call. This commit updates the container creation request doc to alsoinclude the container scratch directory path so that gcs has this path when the
DeleteContainerState
request is made to the GCS. Also, previously we calledDeleteContainerState
duringReleaseResources
call. However,ReleaseResources
iscalled during
KillTask
/ShutdownTask
request and shouldn't delete any resources thatbelong to the container so that call is removed.
Signed-off-by: Amit Barve ambarve@microsoft.com