-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Background and motivation
To unblock #1544, in the issue @ericstj suggested adding this method overload, I'll quote his good reasoning/background here:
I've noticed the same. In general ZipArchiveEntry.Open is very non-intuitive in its behavior.
For read only / write only you get a wrapped DeflateStream which doesn't tell you the length of the stream nor permit you to seek it. For read/write (update) ZipArchiveEntry will read and decompress the entire entry into memory (in fact, into a memory stream backed by a single contiguous managed array) so that you have a seek-able representation of the file. Once opened for update the file is then written back to the archive when the archive itself is closed.
I agree with @.qmfrederik here that we need a better API. Rather than rely solely on the archive's open mode we should allow for the individual call's to Open to specify what kind of stream they want. We can then check that against how the archive was opened in case it is incompatible and throw. Consider the addition:
API Proposal
namespace System.IO.Compression;
public partial class ZipArchiveEntry
{
public Stream Open(FileAccess access);
}For an archive opened with ZipArchiveMode.Update we could allow FileAccess.Read, FileAccess.Write, or FileAccess.ReadWrite, where only the latter would do the MemoryStream expansion. Read and write would be have as they did today. In addition to solving the OP issue, this would address the case where someone is opening an archive for Update and simply adding a single file: we shouldn't need to copy that uncompressed data into memory just to write it to the archive.
API Usage (updated by @iremyux)
var archive = new ZipArchive(fileStream, ZipArchiveMode.Update);
var entry = archive.GetEntry("largefile.txt");
var stream = entry.Open(); // ALWAYS loads entire file into memory as MemoryStream
// Just reading - efficient streaming
var readStream = entry.Open(FileAccess.Read); // Returns DeflateStream - no memory overhead
var firstBytes = new byte[1024];
readStream.Read(firstBytes, 0, 1024); // Only decompresses what you needAlternative Designs (updated by @iremyux)
-
I believe the original issue had discussion about "fixing" the parameterless overload to change behavior? Didn't read too much into that as it is the "expensive" alternative. This alternative would unblock the indirect OOB consumers though.
-
One alternative considered was overloading Open with a ZipArchiveMode parameter, e.g. Open(ZipArchiveMode mode). Our team discussed this option, but concluded that it would conflate archive-level semantics with entry-level behavior. In contrast, FileAccess is a more natural fit.
Risks
None?