Fix corrupt data when Deleting entries from zip files. #103

dellis1972 · 2022-01-17T11:39:09Z

Fixes #102

The initial report showed the following error

Unhandled exception. System.IO.IOException: An attempt was made to move the position before the beginning of the stream.
   at System.IO.MemoryStream.Seek(Int64 offset, SeekOrigin loc)
   at Xamarin.Tools.Zip.ZipArchive.stream_callback(IntPtr state, IntPtr data, UInt64 len, SourceCommand cmd)

The really odd thing was that this only happened if the example files
were added to the archive in a certain order. This pointed to some kind
of memory corruption or stream offset issue.

Digging into the documentation for zip_source_function we can
being to see what the problem is. Our current implementation of this
user callback treats the SourceCommand's of ZIP_SOURCE_BEGIN_WRITE
ZIP_SOURCE_COMMIT_WRITE, ZIP_SOURCE_TELL_WRITE and ZIP_SOURCE_SEEK_WRITE
in the same way as their non WRITE counter parts. In that when we get
a ZIP_SOURCE_SEEK_WRITE we seek in the current archive Stream to
the appropriate location. Similarly ZIP_SOURCE_BEGIN_WRITE seeks to
the start of the current archive Stream. This implementation is incorrect.

The documentation for ZIP_SOURCE_BEGIN_WRITE is as follows

Prepare the source for writing. Use this to create any temporary file(s).

This suggests that a new temporary file is expected to be created. Also
if we look at the following descriptions

dellis1972 · 2022-01-17T11:40:07Z

@grendello This is the problem related to #102 , I managed to narrow it down into a unit test. Still can't see what the problem is, even with the corrected API.

LibZipSharp/Xamarin.Tools.Zip/ZipArchive.cs

grendello · 2022-01-17T11:55:41Z

@dellis1972 it's just occurred to me - do the test files shrink or grow when added to the archive? If the latter, then perhaps this is some sort of problem here?

dellis1972 · 2022-01-17T12:02:25Z

@dellis1972 it's just occurred to me - do the test files shrink or grow when added to the archive? If the latter, then perhaps this is some sort of problem here?

The are either smaller or the same size

 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
     417  Defl:X      261  37% 01-14-2022 10:53 d8073656  info.json
      18  Stored       18   0% 01-14-2022 10:53 8f7bab37  characters_players.json
     494  Defl:X      270  45% 01-14-2022 10:53 2730810a  object_spawn.json
--------          -------  ---                            -------
     929              549  41%                            3 files

dellis1972 · 2022-01-17T12:33:26Z

@grendello Interesting.... As per the original report changing the code from

using (var zip = ZipArchive.Create (stream)) {
	zip.AddFile (filePath, "info.json");
	zip.AddFile (Path.Combine (fileRoot, "characters_players.json"), "characters_players.json");
	zip.AddFile (Path.Combine (fileRoot, "object_spawn.json"), "object_spawn.json");
}

to

using (var zip = ZipArchive.Create (stream)) {
	zip.AddFile (filePath, "info.json");
	zip.AddFile (Path.Combine (fileRoot, "object_spawn.json"), "object_spawn.json");
	zip.AddFile (Path.Combine (fileRoot, "characters_players.json"), "characters_players.json");
}

fixes the issue. So by moving the file which is "stored" around we can seemingly fix the problem.

grendello · 2022-01-17T15:36:17Z

Hmm, this starts to look like an issue with libzip - the weird error code, the rearranging of files fixing it.

dellis1972 · 2022-01-17T16:07:04Z

Hmm, this starts to look like an issue with libzip - the weird error code, the rearranging of files fixing it.

The error seems to be in here https://github.com/nih-at/libzip/blob/26ba5523db09213f532821875542dba7afa04b65/lib/zip_extra_field.c#L351 something to do with extra fields.

dellis1972 · 2022-01-17T16:07:55Z

specifically this line https://github.com/nih-at/libzip/blob/26ba5523db09213f532821875542dba7afa04b65/lib/zip_extra_field.c#L396 is the one which returns an error

grendello · 2022-01-17T16:43:08Z

@dellis1972 Try modifying ZipArchive.AddFile to skip adding any extra fields? I wonder if the extended time stamp field we add there messes something up

dellis1972 · 2022-01-17T17:12:22Z

@dellis1972 Try modifying ZipArchive.AddFile to skip adding any extra fields? I wonder if the extended time stamp field we add there messes something up

I tried that , it didn't change anything. But it did still end up in the _zip_read_local_ef function. So I'm wondering if on *nix we have extra fields which are added by default.

grendello · 2022-01-17T17:26:35Z

@dellis1972 Try modifying ZipArchive.AddFile to skip adding any extra fields? I wonder if the extended time stamp field we add there messes something up

I tried that , it didn't change anything. But it did still end up in the _zip_read_local_ef function. So I'm wondering if on *nix we have extra fields which are added by default.

Yes, it appears that _zip_dirent_write (called by zip_close) adds extra fields if name encoding is UTF-8 and there's no file comment, see here.

dellis1972 · 2022-01-18T10:35:49Z

@grendello the interesting thing is I don't think the error is in the creation of the zip file. Becuse I can repo the bug by opening a zip file created with just zip at the command line (as outlined in the issue report).

So I guess the problem is with loading of the zip. Looking at my debug output the values on https://github.com/nih-at/libzip/blob/26ba5523db09213f532821875542dba7afa04b65/lib/zip_extra_field.c#L381 look very weird.

For a working file fname_len and ef_len tend to be values like17 and 0 respectively. But in a failing one, they are 29546 and 28271 so they are way way bigger. and those values are being used to seek into the zip it seems.

grendello · 2022-01-18T11:06:12Z

@dellis1972 Looking at _zip_buffer_get_16, the only two reasons it would return this kind of values is a) reading of uninitialized memory, b) garbage contents in the zip file. If a) were the case, the values would differ between reads of the same ZIP file, b) may actually be a variation of the case where garbage is saved into the file and then read back. b) could be confirmed by looking at a hex dump of the ZIP at that offset.

dellis1972 · 2022-01-18T12:15:30Z

So we are reading data from what looks like the wrong location.

This one of the example files. In this case libzip is trying to read the extra field data from the locaiton where the stored file data is. In both of the cases I have here. It is the stored file which has the problem.

So the code on [1] is calculating the extra field offset as e->orig->offset + 26 and in this case, that offset is right in the middle of the stored data.

[1] https://github.com/nih-at/libzip/blob/26ba5523db09213f532821875542dba7afa04b65/lib/zip_extra_field.c#L367

dellis1972 · 2022-01-18T12:17:32Z

I just tried zip -d test.zip info.json on the sample zip file and it worked without an error, so this is not a problem with the underlying zip file.

dellis1972 · 2022-01-18T12:19:39Z

I'm wondering if this stored entry even has extra fields.... that might explain why this isn't working.

grendello · 2022-01-18T12:47:05Z

I'm wondering if this stored entry even has extra fields.... that might explain why this isn't working.

So the issue might still be a problem with libzip - why does it think there's an extra field?

dellis1972 · 2022-01-18T14:42:56Z

I'm wondering if this stored entry even has extra fields.... that might explain why this isn't working.

So the issue might still be a problem with libzip - why does it think there's an extra field?

So I just used the ziptool example which ships with libzip to open the faulty archive and then delete the first entry.

ziptool object.zip delete 0

This seems to work as expected. So now it looks like its not libzip.....

I'll try and write a C version of our test and see if I can get it to break.

dellis1972 · 2022-01-18T14:51:31Z

So I put this together and it works it seems. The info.json is removed from the archive with no errors.

int
main(int argc, char *argv[]) {
    const char *archive;
    zip_source_t *src;
    zip_t *za;
    zip_error_t error;
    void *data;
    size_t size;

    if (argc < 2) {
        fprintf(stderr, "usage: %s archive\n", argv[0]);
        return 1;
    }
    archive = argv[1];
    zip_error_init(&error);

    za = zip_open (archive, 0, &error);

    zip_error_fini(&error);

    zip_int16_t index = zip_name_locate (za, "info.json", ZIP_FL_NOCASE);

    // remove the entry
    if (index >= 0) {
        printf ("deleting info.json at %hu", index);
        if (zip_delete (za, index) < 0) {
            fprintf(stderr, "did not delete entry '%hu': %s\n", index, zip_strerror(za));
        }
    }

    /* close archive */
    if (zip_close(za) < 0) {
        fprintf(stderr, "can't close zip archive '%s': %s\n", archive, zip_strerror(za));
        return 1;
    }

    return 0;
}

dellis1972 · 2022-01-18T14:55:36Z

So my thinking is this... I wonder if our code is not seeking from the right locations? So we end up at the wrong place in the memory stream.

There are a number of places where libzip will seek from the beginning of the file, and some where its from the current location. I'll start looking to make sure we don't change the Position of the stream.

grendello · 2022-01-18T15:19:22Z

So my thinking is this... I wonder if our code is not seeking from the right locations? So we end up at the wrong place in the memory stream.

There are a number of places where libzip will seek from the beginning of the file, and some where its from the current location. I'll start looking to make sure we don't change the Position of the stream.

It appears to be very likely, indeed

LibZipSharp/Xamarin.Tools.Zip/ZipArchive.cs

Fixes dotnet#102 The initial report showed the following error ``` Unhandled exception. System.IO.IOException: An attempt was made to move the position before the beginning of the stream. at System.IO.MemoryStream.Seek(Int64 offset, SeekOrigin loc) at Xamarin.Tools.Zip.ZipArchive.stream_callback(IntPtr state, IntPtr data, UInt64 len, SourceCommand cmd) ``` The really odd thing was that this only happened if the example files were added to the archive in a certain order. This pointed to some kind of memory corruption or stream offset issue. Digging into the documentation for [zip_source_function]](https://libzip.org/documentation/zip_source_function.html#DESCRIPTION) we can being to see what the problem is. Our current implementation of this user callback treats the SourceCommand's of `ZIP_SOURCE_BEGIN_WRITE` `ZIP_SOURCE_COMMIT_WRITE`, `ZIP_SOURCE_TELL_WRITE` and `ZIP_SOURCE_SEEK_WRITE` in the same way as their non `WRITE` counter parts. In that when we get a `ZIP_SOURCE_SEEK_WRITE` we seek in the current archive `Stream` to the appropriate location. Similarly `ZIP_SOURCE_BEGIN_WRITE` seeks to the start of the current archive `Stream`. This implementation is incorrect. The documentation for `ZIP_SOURCE_BEGIN_WRITE` is as follows ``` Prepare the source for writing. Use this to create any temporary file(s). ``` This suggests that a new temporary file is expected to be created. Also if we look at the following descriptions ZIP_SOURCE_TELL Return the current read offset in the source, like ftell(3). ZIP_SOURCE_TELL_WRITE Return the current write offset in the source, like ftell(3). We can see that there are supposed to be two different offsets. One for reading and one for writing. This leads us to the reason why this problem was occurring. Because both the `READ` and `WRITE` are operating on the exact same `Stream` were are getting into a position where the original data was being overwritten by us deleting an entry. What we should have been doing was creating a temp `Stream` when we got the `ZIP_SOURCE_BEGIN_WRITE` SourceCommand. Then use this temp `Stream` to write all the required changes to before finally overwriting the original `Stream` when we get the `ZIP_SOURCE_COMMIT_WRITE` SourceCommand. This will ensure that the original archive data will remain intact until all the processing is done. So rather than passing in a `GCHandle` to the `Stream` directly , a new class has been introduced. The `CallbackContext` class will be used to pass data between the managed and unmanaged code. It will contain the properties for the `Source` stream as well as `Destination` stream. The `Source` will always be used for reading, the `Destination` (if present) will be used for writing. Now on `ZIP_SOURCE_BEGIN_WRITE` we create a new temp file stream which we will use to create the updated zip file. This new stream will be stored in the `CallbackContext.Destination` property so that all the other `WRITE` based commands will work on it. Finally when we get the `ZIP_SOURCE_COMMIT_WRITE` SourceCommand, we will copy the data to the `CallbackContext.Source` stream. Then finally we will dispose of the `CallbackContext.Destination`.

dellis1972 · 2022-01-21T11:45:22Z

LibZipSharp/Xamarin.Tools.Zip/ZipArchive.cs

+						context.Destination = File.Open (tempFile, FileMode.OpenOrCreate, FileAccess.ReadWrite);
+						context.DestinationFileName = tempFile;
+					} catch (IOException) {
+						// ok use a memory stream as a backup


@grendello do you think this is a good idea? or should we just throw an exception and fail.

I think it's fine, we should do our best to make the code work

dellis1972 requested a review from grendello January 17, 2022 11:39

grendello reviewed Jan 17, 2022

View reviewed changes

LibZipSharp/Xamarin.Tools.Zip/ZipArchive.cs Outdated Show resolved Hide resolved

dellis1972 mentioned this pull request Jan 18, 2022

Breakage when files are added to archive in the "wrong" order #102

Closed

dellis1972 force-pushed the Issue102 branch from ceb9720 to 0aec1a5 Compare January 20, 2022 10:42

dellis1972 changed the title ~~Initial code changes for Issue #102~~ Fix corrupt data when Deleting entries from zip files. Jan 20, 2022

dellis1972 marked this pull request as ready for review January 20, 2022 10:44

dellis1972 requested a review from grendello January 20, 2022 10:44

grendello requested changes Jan 20, 2022

View reviewed changes

LibZipSharp/Xamarin.Tools.Zip/ZipArchive.cs Show resolved Hide resolved

LibZipSharp/Xamarin.Tools.Zip/ZipArchive.cs Outdated Show resolved Hide resolved

dellis1972 force-pushed the Issue102 branch from 0aec1a5 to 63ff736 Compare January 21, 2022 09:55

dellis1972 requested a review from grendello January 21, 2022 10:01

dellis1972 commented Jan 21, 2022

View reviewed changes

grendello approved these changes Jan 21, 2022

View reviewed changes

grendello merged commit dbe3236 into dotnet:main Jan 21, 2022

dellis1972 deleted the Issue102 branch January 21, 2022 16:46

Fix corrupt data when Deleting entries from zip files. #103

Fix corrupt data when Deleting entries from zip files. #103

Uh oh!

Conversation

dellis1972 commented Jan 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dellis1972 commented Jan 17, 2022

Uh oh!

Uh oh!

grendello commented Jan 17, 2022

Uh oh!

dellis1972 commented Jan 17, 2022

Uh oh!

dellis1972 commented Jan 17, 2022

Uh oh!

grendello commented Jan 17, 2022

Uh oh!

dellis1972 commented Jan 17, 2022

Uh oh!

dellis1972 commented Jan 17, 2022

Uh oh!

grendello commented Jan 17, 2022

Uh oh!

dellis1972 commented Jan 17, 2022

Uh oh!

grendello commented Jan 17, 2022

Uh oh!

dellis1972 commented Jan 18, 2022

Uh oh!

grendello commented Jan 18, 2022

Uh oh!

dellis1972 commented Jan 18, 2022

Uh oh!

dellis1972 commented Jan 18, 2022

Uh oh!

dellis1972 commented Jan 18, 2022

Uh oh!

grendello commented Jan 18, 2022

Uh oh!

dellis1972 commented Jan 18, 2022

Uh oh!

dellis1972 commented Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dellis1972 commented Jan 18, 2022

Uh oh!

grendello commented Jan 18, 2022

Uh oh!

Uh oh!

Uh oh!

dellis1972 Jan 21, 2022

Choose a reason for hiding this comment

Uh oh!

grendello Jan 21, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dellis1972 commented Jan 17, 2022 •

edited

Loading

dellis1972 commented Jan 18, 2022 •

edited

Loading