Skip to content

Add support for adding new entries to large zip archives #49149

@madelson

Description

@madelson

Background and Motivation

We use System.IO.Compression.ZipArchive to manage the creation of large .zip files in a streaming fashion. When creating new archives, this works quite well (with ZipArchiveMode.Create, it writes through to the underlying stream so long as you only write to one entry at a time).

However, when we want to append to an existing archive, we have to use ZipArchiveMode.Update. According to the doc comments, with this mode the contents of the entire archive must be held in memory! This caused our system to crash due to array length restrictions when working with a particularly large file.

The zip format is designed to support efficient appending of files, so I believe it should be possible for .NET's implementation to support this use-case.

Proposed API

This could be addressed using a new ZipArchiveMode enum value, perhaps named ZipArchiveMode.Append to match FileMode.Append. This would be similar to Create but would allow for an existing file to be used.

Usage Examples

using var fileStream = File.Open("existing.zip");
var zip = new ZipArchive(fileStream, ZipArchiveMode.Append);
var newEntry = zip.CreateEntry("new");
using var writer = new StreamWriter(newEntry.Open());
// write lots of content!

Alternative Designs

Another approach would be to change the behavior of Update such that it would only bring things into memory as needed (e. g. if you change the contents of an existing entry or keep multiple entries open for writing at the same time).

This second approach would have the benefit of improving the performance of all existing programs which use Update mode to append to existing zips, which seems likely to be a common use-case for Update.

Similarly, it seems that this could also enable Update to share the streaming benefits of Read mode in many cases.

The downside would be that code could silently go from performant to non-performant if the usage limitations were violated, although that is already the case with Read when the underlying stream is not seekable and Create when writing to multiple entries at once.

Another potential downside is that today presumably Update mode does all writes at the end of the operation, potentially allowing other readers to use the zip until then. This change would alter that behavior.

Risks

With the design approach of optimizing Update for specific scenarios, the design might entail update switching from a write-through approach to an in-memory approach partway through an operation. This might add overhead to someone who is actually leveraging the ability modify existing entries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api-suggestionEarly API idea and discussion, it is NOT ready for implementationarea-System.IO.Compressionin-prThere is an active PR which will close this issue when it is mergedneeds-further-triageIssue has been initially triaged, but needs deeper consideration or reconsideration

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions