Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: tarfile: improve auto-decompression handling #427

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 42 additions & 20 deletions docker/tarfile/src.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ package tarfile
import (
"archive/tar"
"bytes"
"compress/gzip"
"context"
"encoding/json"
"io"
Expand Down Expand Up @@ -43,28 +42,31 @@ type layerInfo struct {
// the source of an image.
// To do for both the NewSourceFromFile and NewSourceFromStream functions

// NewSourceFromFile returns a tarfile.Source for the specified path
// NewSourceFromFile supports both conpressed and uncompressed input
// NewSourceFromFile returns a tarfile.Source for the specified path.
func NewSourceFromFile(path string) (*Source, error) {
file, err := os.Open(path)
if err != nil {
return nil, errors.Wrapf(err, "error opening file %q", path)
}
defer file.Close()

reader, err := gzip.NewReader(file)
// If the file is already not compressed we can just return the file itself
// as a source. Otherwise we pass the stream to NewSourceFromStream.
stream, isCompressed, err := compression.AutoDecompress(file)
if err != nil {
return nil, errors.Wrapf(err, "detecting compression for file %q", path)
}
if !isCompressed {
return &Source{
tarPath: path,
}, nil
}
defer reader.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it turns out that gzip.NewReader returns a ReadCloser, and the caller is expected to really Close the stream; the compression.DetectCompression implementation missed that.

Luckily(?) the gzip.Reader.Close() is (currently?) trivial, it only returns an error value which Read() would AFAICT would have returned anyway as long as the consumer is reading until EOF.

But that complicates the AutoDecompress design; should it wrap uncompressed inputs in NopCloser? Should it return a separate close callback instead of modifying the io.Reader stream (like dirImageMockWithRef does)?

Or do we just pretend that gzip.Reader.Close does not exist? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change the API to return an io.ReadCloser. Since io.Readerio.ReadCloser there shouldn't be a problem with any users of the interface (aside from the downside they probably won't notice they should call Close now).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and that

reader =defer reader.Close()
reader, … = AutoDecompress(…)
defer reader.Close() // Again!

is unintuitive. But then none of the alternatives I can think of are any more elegant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but if you compare it to gzip.NewReader you have to do a similar thing.

reader := ...
defer reader.Close()
reader2, ... := gzip.NewReader(reader)
defer reader2.Close()

We could work around it by doing some very dodgy .(io.Closer) handling in order to still allow users to pass io.Readers that don't have a hidden Close() method. But that's probably what you were thinking when you said that you can't think of any more elegant workarounds. 😉

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unintuitive part, to me, is that with raw gzip, there is reader.Close() and reader2.Close(); with AutoDecompress, there would be two reader.Close()s, which looks pretty similar to a copy&paste bug.

But I still can’t think of a much better API.

reader =defer reader.Close()
reader, close, … = AutoDecompress(…)
defer close()

does not look noticeably better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment I'm just going to ignore the close method for gzip if that's okay with you. Otherwise we'd have to NopCloser most of the other functions, and DetectCompression's signature would look wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not really happy to ignore resource cleanup and error detection, no. It’s a time bomb, and doing the right thing seems plausible.

I can take on the work of updating the existing compression package and its users, if you don’t want to deal with the rest of c/image as part of this PR (apart from updating it to use the new API).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, in that case I'll work on doing that tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on fixing this up today.


return NewSourceFromStream(reader)
return NewSourceFromStream(stream)
}

// NewSourceFromStream returns a tarfile.Source for the specified inputStream, which must be uncompressed.
// The caller can close the inputStream immediately after NewSourceFromFile returns.
// NewSourceFromStream returns a tarfile.Source for the specified inputStream,
// which can be either compressed or uncompressed. The caller can close the
// inputStream immediately after NewSourceFromFile returns.
func NewSourceFromStream(inputStream io.Reader) (*Source, error) {
// FIXME: use SystemContext here.
// Save inputStream to a temporary file
Expand All @@ -81,7 +83,18 @@ func NewSourceFromStream(inputStream io.Reader) (*Source, error) {
}
}()

// TODO: This can take quite some time, and should ideally be cancellable using a context.Context.
// In order to be compatible with docker-load, we need to support
// auto-decompression (it's also a nice quality-of-life thing to avoid
// giving users really confusing "invalid tar header" errors).
inputStream, _, err = compression.AutoDecompress(inputStream)
if err != nil {
return nil, errors.Wrap(err, "auto-decompress docker-archive source")
}

// Copy the plain archive to the temporary file.
//
// TODO: This can take quite some time, and should ideally be cancellable
// using a context.Context.
if _, err := io.Copy(tarCopyFile, inputStream); err != nil {
return nil, errors.Wrapf(err, "error copying contents to temporary file %q", tarCopyFile.Name())
}
Expand Down Expand Up @@ -292,7 +305,23 @@ func (s *Source) prepareLayerData(tarManifest *ManifestItem, parsedConfig *manif
return nil, err
}
if li, ok := unknownLayerSizes[h.Name]; ok {
li.size = h.Size
// Since GetBlob will decompress layers that are compressed we need
// to do the decompression here as well, otherwise we will
// incorrectly report the size. Pretty critical, since tools like
// umoci always compress layer blobs. Obviously we only bother with
// the slower method of checking if it's compressed.
stream, isCompressed, err := compression.AutoDecompress(t)
if err != nil {
return nil, errors.Wrapf(err, "auto-decompress %s to find size", h.Name)
}
uncompressedSize := h.Size
if isCompressed {
uncompressedSize, err = io.Copy(ioutil.Discard, stream)
if err != nil {
return nil, errors.Wrapf(err, "reading %s to find size", h.Name)
}
}
li.size = uncompressedSize
delete(unknownLayerSizes, h.Name)
}
}
Expand Down Expand Up @@ -386,16 +415,9 @@ func (s *Source) GetBlob(ctx context.Context, info types.BlobInfo) (io.ReadClose
// be verifing a "digest" which is not the actual layer's digest (but
// is instead the DiffID).

decompressFunc, reader, err := compression.DetectCompression(stream)
reader, _, err := compression.AutoDecompress(stream)
if err != nil {
return nil, 0, errors.Wrapf(err, "Detecting compression in blob %s", info.Digest)
}

if decompressFunc != nil {
reader, err = decompressFunc(reader)
if err != nil {
return nil, 0, errors.Wrapf(err, "Decompressing blob %s stream", info.Digest)
}
return nil, 0, errors.Wrapf(err, "auto-decompress blob %s", info.Digest)
}

newStream := readCloseWrapper{
Expand Down
20 changes: 19 additions & 1 deletion pkg/compression/compression.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"github.com/pkg/errors"

"github.com/sirupsen/logrus"
"github.com/ulikunitz/xz"
)

// DecompressorFunc returns the decompressed stream, given a compressed stream.
Expand All @@ -26,7 +27,7 @@ func Bzip2Decompressor(r io.Reader) (io.Reader, error) {

// XzDecompressor is a DecompressorFunc for the xz compression algorithm.
func XzDecompressor(r io.Reader) (io.Reader, error) {
return nil, errors.New("Decompressing xz streams is not supported")
return xz.NewReader(r)
}

// compressionAlgos is an internal implementation detail of DetectCompression
Expand Down Expand Up @@ -65,3 +66,20 @@ func DetectCompression(input io.Reader) (DecompressorFunc, io.Reader, error) {

return decompressor, io.MultiReader(bytes.NewReader(buffer[:n]), input), nil
}

// AutoDecompress takes a stream and returns an uncompressed version of the
// same stream. It's a simple wrapper around DetectCompression that removes the
// need to touch DecompressorFuncs.
func AutoDecompress(stream io.Reader) (io.Reader, bool, error) {
decompressFunc, stream, err := DetectCompression(stream)
if err != nil {
return nil, false, errors.Wrapf(err, "detect compression")
}
if decompressFunc != nil {
stream, err = decompressFunc(stream)
if err != nil {
return nil, false, errors.Wrapf(err, "auto decompression")
}
}
return stream, decompressFunc != nil, nil
}
15 changes: 5 additions & 10 deletions pkg/compression/compression_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,12 @@ import (

func TestDetectCompression(t *testing.T) {
cases := []struct {
filename string
unimplemented bool
filename string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Non-blocking: This could become a []string.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but I prefered the c.filename as it's more descriptive (and makes the diff smaller). But I could switch it if you prefer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is fine as well.

}{
{"fixtures/Hello.uncompressed", false},
{"fixtures/Hello.gz", false},
{"fixtures/Hello.bz2", false},
{"fixtures/Hello.xz", true},
{"fixtures/Hello.uncompressed"},
{"fixtures/Hello.gz"},
{"fixtures/Hello.bz2"},
{"fixtures/Hello.xz"},
}

// The original stream is preserved.
Expand Down Expand Up @@ -54,10 +53,6 @@ func TestDetectCompression(t *testing.T) {
switch {
case decompressor == nil:
uncompressedStream = updatedStream
case c.unimplemented:
_, err := decompressor(updatedStream)
assert.Error(t, err)
continue
default:
s, err := decompressor(updatedStream)
require.NoError(t, err)
Expand Down
1 change: 1 addition & 0 deletions vendor.conf
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,4 @@ github.com/ostreedev/ostree-go aeb02c6b6aa2889db3ef62f7855650755befd460
github.com/gogo/protobuf fcdc5011193ff531a548e9b0301828d5a5b97fd8
github.com/pquerna/ffjson master
github.com/syndtr/gocapability master
github.com/ulikunitz/xz v0.5.4