-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commit writable WCOW layers as read-only parent layers #4415
Conversation
Build succeeded.
|
58544b5
to
df2ffc6
Compare
Build succeeded.
|
df2ffc6
to
3a6a98c
Compare
Build succeeded.
|
Build failures due to some GCR weirdness? I don't see how different runtime choice would affect the results of pulling images, and the Windows tests (which are the only ones that should be affected by my changes) passed. |
@kevpar - PTAL |
3a6a98c
to
d8f7d00
Compare
Build succeeded.
|
Reviewing this, and trying to work out how best to deal with #2366, I've realised that the
The main concern I have is how to tell a random directory with a That said, I would be more-comfortable if we could actually tell the two apart by something other than a commonly-named subdirectory, but I can't see a definitive flow of information from I'm not sure if I'm going about this the right way, or if I'm overlooking something else important. If Edit: Per the note in #4419, it's possible we can leverage hcsshim to create sandbox.vhdx for the two cases (1, 4) where we don't already have one due to "no parents". That would simplify a lot of things here. This would turn case 1 into case and 2 ( On the plus side, this latter approach removes the 'bind-mount' concept, and hence makes This could be done additionally or subsequently to this PR, and then #4419 rebased on top, simplified by removing the bind-mount support. Edit again: Initial testing suggests the approach for creating parentless sandboxes ( |
Thinking about readonlyness too, perhaps Views should actually be read-write sandbox layers with tiny sandboxes. That would remove another source of bind-mounts. This is starting to feel like another PR on top of this one. Edit: That's the implementation approach I took in #4419, as there seems to be no other way to implement read-only mounts for the local system. |
@ambarve - PTAL |
d8f7d00
to
6af0a03
Compare
Build succeeded.
|
6af0a03
to
e7077f5
Compare
Build succeeded.
|
e7077f5
to
d3349c6
Compare
Build succeeded.
|
d3349c6
to
01c86bb
Compare
Build succeeded.
|
01c86bb
to
6a43f43
Compare
tl;dr: It worked as a container root last time I tested it, and not much has changed since, AFAIK. I was testing this by using BuildKit with the containerd backend to build multi-layer images. I haven't touched the BuildKit side in a while (not much point until the containerd changes land and I have have BuildKit's CI preventing further breakages in the containerd backend), but moby/buildkit#616 (comment) (July 2020) shows a couple of Dockerfiles that required this change, and included multiple Since then, testing, i.e. via #4419, has been host-mounting the layers, rather than container-mounting. #4419 enables the Snapshot test suite on Windows which exercises this change pretty thoroughly for host-mounts. The behaviour here (exporting out a tar-layer and then importing back over the top) is also how Docker commits writeable layers, so I'd be surprised if it didn't work. Unlike #4399 and #4419 (the PRs on either side of this) I'm not aware of any existing containerd tests that actually exercise this code-path and mounts so-created layers as a container's root (because that's not really containerd's job...), at least in the list of "disabled on Windows" tests. I'd love to know if I've overlooked such a test, because getting it enabled would ensure that we don't suffer bit-rot over time. |
40d2479
to
871c80f
Compare
Build succeeded.
|
That's cool. I was wondering if it would cause any problems but it doesn't so that's nice. I think the main changes in this PR are only in @jterry75 This looks good to me. Do you have any concerns? If not, can you merge this? |
871c80f
to
39e4bb4
Compare
Build succeeded.
|
39e4bb4
to
859fd48
Compare
Rebased now that #4399 has been merged. |
c04182b
to
e8a0505
Compare
Build succeeded.
|
e8a0505
to
664d3a7
Compare
Because there are no tests (AFAIK) that exercise this code-path outside the snapshotter test suite, I've rebased #4419 onto it, because that PR does enable the snapshotter test suite, to test the slight refactoring I just pushed based on review feedback. Edit: The test suite and integration run passed, but it's taken long enough that I suspect the second integration suite run will hit the 30-minute timeout and fail. But I'm satisfied I didn't break anything obvious when I moved the 'restream' out into a function. |
Build succeeded.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
with one comment~
A Scratch layer only contains a sandbox.vhdx, but to be used as a parent layer, it must also contain the files on-disk. Hence, we Export the layer from the sandbox.vhdx and Import it back into itself, so that both data formats are present. Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
664d3a7
to
8a4cbab
Compare
Build succeeded.
|
This builds on #4399 (itself on top of #4395) to implement
func (s *snapshotter) Commit
for Windows layers.Contributes towards #1920, with the goal of resolving moby/buildkit#616
It's a bit odd, but it seems that the only way to get this to work is to restream the read-write layer back on top of itself. I suspect this means that it is consuming extra space, but that appears to be necessary. Restreaming into a new snapshot means you cannot get a diff against the parent layer, which we often need to do.
The main weirdness here is that it seems that a windows-layer has two completely distinct filesystems, and you cannot diff all the combinations of them that you'd like, so we have to keep the writable-layer file around even after converting into the read-only-but-parent-usable format.
Note that support for parentless Active layers is not part of this PR, that's currently mixed into #4419 as it's both a separate discussion, and depends on a proposed hcsshim feature to implement. This only covers committing layers that you could already create, as Active layers with a parent, generally a chain starting with an image from mcr.microsoft.com.
Everything beyond this point is details and a record of experiments for how I got to this particular approach
A Scratch layer, i.e. the result of
(s *snapshotter) Prepare
, contains only sandbox.vhdx. It can only be used asmount.Mount.Source
, and theupper
forComparer.Compare
.A read-only layer has a Files/ directory, a Hives/ directory, and perhaps a UtilityVM/ directory and some other boot-related stuff. This is what you get after calling
Applier.Apply
. This sort of layer can appear inSnapshot.ParentIDs
, and the parents list thatComparer.Compare
for Windows sneaks through to the Windows archiver, introduced in #4399 based on the same sneaking done inApplier.Apply
.There's no problem with having both sandbox.vhdx and Files/ et. al. in the same directory. containerd already does this when creating snapshots out of the content store, because weApply
the pulled layer diff to aPrepare
d snapshot. However, it's important to note that the sandbox.vhdx in such a layer is an empty diff against the parent.However, snapshots created from the content store cannot produce diffs against their parents, because the diffing done by hcsshim appears to only look at the sandbox.vhdx of the upper. They will instead produce a valid diff with only one small metadata change to the 'Documents and Settings' symlink, and a set of (probably NULL) registry hive updates.On that note, I have observed (usingsnapshots diff
) that diffing two snapshots for pulled layers will actually produce practically-empty diffs at the moment, because the diff works with the sandbox.vhdx in the upper, not the extracted files, and when a layer is populated byApply
, the sandbox.vhdx is not updated.snapshots diff
also changes the modification data of the upper's sandbox.vhdx, which is odd.The upshot for me is that a read-only directory created withApplier.Apply
should not have a sandbox.vhdx, but that is hard becauseSnapshotter.Prepare
does not currently have a way to know if the snapshot is being prepared for use read-write, or to be Apply'd to and then Commit'd.The top commit also removes the sandbox.vhdx from Snapshots populated by Apply rather than by mounting and editing them, so that attempts to diff such snapshots fail (esoterically, for now) rather than returning bad data, which may leak into exported images if you do things wrongly enough. It's a separate commit, because this affects the on-disk result of pulling images, while the other changes only affect the on-disk result of committing read-write layers.All the crossed-out text above was fixed by #4643, which recognises snapshots which will be
Apply
'd to, and doens't create the sandbox.vhdx in the first place.I feel like there's some kind of misbehaviour or bug in hcsshim, that it cannot diff two directories in read-only format. It's possible I'm using it wrong though.
I was also mostly able to reproduce all of the above using the
wclayer
utility from hcsshim (master), plus a little utility to callSetVolumeMount
andDeleteVolumeMount
so I could see what's actually going on. This also helped me understand that the "mount-in-place" workaround I used in moby/buildkit for lack of #2366 is 100% wrong, and I've just been lucky that I haven't corrupted data using it yet. -_-I did some comparisons with Docker Engine's behaviour, and when idle, none of the directories in %PROGRAMDATA%\Docker\windowsfilter have sandbox.vhdx. During a container build, the current layer has a sandbox.vhdx, as you'd expect. However, I also noticed that its read-only directories are not hard-links, but only contain modified files, so it's possible that something different is happening there. I noted that they pass a "windowsFilter" flag through hcsshim's
DriverInfo
that current hcsshim builds don't appear to honour in many code-paths, and none of the containerd code appears to use.I'm not sure if this is a V1/V2 difference, or indicative of a larger issue.