-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker: tarfile: improve auto-decompression handling #427
Conversation
This now includes fixes for |
docker/tarfile/src.go
Outdated
if err != nil { | ||
return &Source{ | ||
tarPath: path, | ||
}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just a quick question for now:) Is there an user who benefits from adding decompression support for streams? The only user of NewSourceFromStream
in c/image processes the output of docker save
, which is not compressed.
If there is no user, I’d rather leave the decompression exclusive to NewSourceFromFile
, or rewrite the code in another way to preserve the property that NewSourceFromFile
on an uncompressed file doesn’t make an unnecessary (and pretty costly) copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, I can rework this so that it just uses the original file if it is uncompressed (I actually missed that the code did this when I deleted this hunk -- oops!).
But I don't see the negative of adding it to the stream implementation. The uncompressed path for streams is basically identical to the compressed path (minus the overhead of reading 5 bytes rather than using bufio
for the entire read -- which isn't a significant overhead. So adding it to the stream implementation just means that containers/image
users can stream compressed archives directly to c/image rather than making two copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So adding it to the stream implementation just means that containers/image users can stream compressed archives directly to c/image rather than making two copies.
This does not actually change that, because docker-archive:
always calls NewSourceFromFile
(docker/tarball
is an internal helper, not a transport with its own ImageReference
that could be used via copy.Image
). Which is why was asking about an user who benefits.
Sure, this does not hurt, and the consistency in the API is nice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more about users who might want to use docker/tarball
but there are probably no such users. Anyway, I've fixed up the code to now no longer make a copy if the archive is already uncompressed.
Fixed up the real test failure. Now |
@cyphar https://github.com/projectatomic/skopeo#contributing tells how to fix the breaking skopeo tests and test them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK overall.
The need to decompress layers only to count the size is awkward, but not obviously avoidable—building a manifest with the compressed digests, and having to compute them, is not better.
Closing the gzip
stream is the only outstanding comment. (It would be unreasonable to block this PR on using DetectCompression
when the design of that is incorrect, but I think it’s fair to block this on designing AutoDecompress
correctly. Still, maybe we can get away with ignoring the issue anyway. @runcom ?)
@@ -15,13 +15,12 @@ import ( | |||
|
|||
func TestDetectCompression(t *testing.T) { | |||
cases := []struct { | |||
filename string | |||
unimplemented bool | |||
filename string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Non-blocking: This could become a []string
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, but I prefered the c.filename
as it's more descriptive (and makes the diff smaller). But I could switch it if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is fine as well.
return &Source{ | ||
tarPath: path, | ||
}, nil | ||
} | ||
defer reader.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it turns out that gzip.NewReader
returns a ReadCloser
, and the caller is expected to really Close
the stream; the compression.DetectCompression
implementation missed that.
Luckily(?) the gzip.Reader.Close()
is (currently?) trivial, it only returns an error value which Read()
would AFAICT would have returned anyway as long as the consumer is reading until EOF.
But that complicates the AutoDecompress
design; should it wrap uncompressed inputs in NopCloser
? Should it return a separate close
callback instead of modifying the io.Reader
stream (like dirImageMockWithRef
does)?
Or do we just pretend that gzip.Reader.Close
does not exist? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could change the API to return an io.ReadCloser
. Since io.Reader
⊆ io.ReadCloser
there shouldn't be a problem with any users of the interface (aside from the downside they probably won't notice they should call Close
now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… and that
reader = …
defer reader.Close()
reader, … = AutoDecompress(…)
defer reader.Close() // Again!
is unintuitive. But then none of the alternatives I can think of are any more elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but if you compare it to gzip.NewReader
you have to do a similar thing.
reader := ...
defer reader.Close()
reader2, ... := gzip.NewReader(reader)
defer reader2.Close()
We could work around it by doing some very dodgy .(io.Closer)
handling in order to still allow users to pass io.Reader
s that don't have a hidden Close()
method. But that's probably what you were thinking when you said that you can't think of any more elegant workarounds. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unintuitive part, to me, is that with raw gzip
, there is reader.Close()
and reader
2
.Close()
; with AutoDecompress
, there would be two reader.Close()
s, which looks pretty similar to a copy&paste bug.
But I still can’t think of a much better API.
reader = …
defer reader.Close()
reader, close, … = AutoDecompress(…)
defer close()
does not look noticeably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the moment I'm just going to ignore the close method for gzip
if that's okay with you. Otherwise we'd have to NopCloser
most of the other functions, and DetectCompression
's signature would look wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not really happy to ignore resource cleanup and error detection, no. It’s a time bomb, and doing the right thing seems plausible.
I can take on the work of updating the existing compression
package and its users, if you don’t want to deal with the rest of c/image as part of this PR (apart from updating it to use the new API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, in that case I'll work on doing that tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on fixing this up today.
docker/tarfile/src.go
Outdated
if err != nil { | ||
return nil, errors.Wrapf(err, "auto-decompress %s to find size", h.Name) | ||
} | ||
rawSize := h.Size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(uncompressedSize
perhaps? raw
suggests to me the original, i.e. ”not decompressed”.)
@umohnani8 Oh, so we still haven't sorted that problem out (it's been a problem for more than a year now -- and there were PRs posted that were supposed to resolve this issue). That's a shame. |
I can’t remember PRs for that, am I misremembering? The coupling with Now that the churn has settled a bit (but we do still break the API), and more importantly there are a few other prominent users of c/image, maybe we could reconsider that tradeoff… though I’m not exactly sure in which direction. Actually document and commit to a subset of API to be stable? That would probably still be a very small subset. |
@mtrmac This all happened more than a year ago, it took me a while to find the PRs. Here's the ones I could find:
The idea was to see if we could run the integration tests as part of The main problem with |
(
At that point in the past, c/image is now breaking the API a bit less, so maybe the tradeoff is no longer that compelling. Also, with umoci, Nix, libpod, buildah, and maybe other users, it is unclear whether treating Ideally, of course, c/image would not break API ever, but with things like #431 incoming, I don’t think we are ready to pay the price of that commitment yet—or I haven’t heard that the demand is strong enough so far. ) Anyway, in this particular case, I’m fine with skipping the ”do not merge” temp-PR, we can with reasonable confidence expect that this will just work, so that PR can be prepared after merging this one. For me, this PR blocks only on closing the |
(
One more aspect of this is that c/image does not have any real integration tests; those are all in |
@cyphar Needs a rebase, if you are still interested in adding this PR. |
Yes, I will revive this PR as it has broken |
For quite a while we were blocked on support xz decompression because it effectively required shelling out to "unxz" (which is just bad in general). However, there is now a library -- github.com/ulikunitz/xz -- which has implemented LZMA decompression in pure Go. It isn't as featureful as liblzma (and only supports 1.0.4 of the specification) but it is an improvement over not supporting xz at all. And since we aren't using its writer implementation, we don't have to worry about sub-par compression. Signed-off-by: Aleksa Sarai <asarai@suse.de>
Most users of DetectCompression will use the decompressorFunc immediately afterwards (the only exception is the copy package). Wrap up this boilerplate in AutoDecompress, so it can be used elsewhere. Signed-off-by: Aleksa Sarai <asarai@suse.de>
This matches how "docker load" deals with compressed images, as well as being a general quality-of-life improvement over the previous error messages we'd give. This also necessarily removes the previous special-cased gzip handling, and adds support for auto-decompression for streams as well. Signed-off-by: Aleksa Sarai <asarai@suse.de>
Tools like umoci will always compress layers, and in order to work around some *lovely* issues with DiffIDs, tarfile.GetBlob would always decompress them. However the BlobInfo returned from tarfile.GetBlob would incorrectly give the size of the *compressed* layer because the size caching code didn't actually check the layer size, resulting in "skopeo copy" failing whenever sourcing umoci images. As an aside, for some reason the oci: transport doesn't report errors when the size is wrong... Signed-off-by: Aleksa Sarai <asarai@suse.de>
Rebased and containers/skopeo#516 is the test PR. |
@cyphar Tests are failing? |
@rhatdan It's because I've added a new vendor which means we need a |
@cyphar this is blocking us from fixing a couple of issues in podman. |
This was finished in #481. Thanks again! |
This matches how "docker load" deals with compressed images, as well as
being a general quality-of-life improvement over the previous error
messages we'd give. This also necessarily removes the previous
special-cased gzip handling, and adds support for auto-decompression for
streams as well.
For quite a while we were blocked on support xz decompression because it
effectively required shelling out to "unxz" (which is just bad in
general). However, there is now a library -- github.com/ulikunitz/xz --
which has implemented LZMA decompression in pure Go. It isn't as
featureful as liblzma (and only supports 1.0.4 of the specification) but
it is an improvement over not supporting xz at all. And since we aren't
using its writer implementation, we don't have to worry about sub-par
compression.
Tools like umoci will always compress layers, and in order to work
around some lovely issues with DiffIDs, tarfile.GetBlob would always
decompress them. However the BlobInfo returned from tarfile.GetBlob
would incorrectly give the size of the compressed layer because the
size caching code didn't actually check the layer size, resulting in
"skopeo copy" failing whenever sourcing umoci images.
Signed-off-by: Aleksa Sarai asarai@suse.de