-
Couldn't load subscription status.
- Fork 300
Allow changed reads #286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow changed reads #286
Conversation
e734bd5 to
c0e85bc
Compare
|
Thank you. The functionality sounds good, but the implementation will need some refinement. We'll discuss it internally and get back to you, probably some time next week. |
Cool, good to know, thanks. About implementation - surely it will need refinement. Just for the record - I tried extending
Alright, I'll fix the errors from CI in the meantime. |
|
The feature you add to sources has broader application than reading changed data, so a more general name would better describe it. We've come up with Also, by moving support for it into the infrastructure, we can minimise diffs for each source: index f6db2a71..d59ab828 100644
--- a/lib/zip_source_layered.c
+++ b/lib/zip_source_layered.c
@@ -62,6 +62,9 @@ zip_source_layered_create(zip_source_t *src, zip_source_layered_callback cb, voi
if (zs->supports < 0) {
zs->supports = ZIP_SOURCE_SUPPORTS_READABLE;
}
+ else if ((zip_source_supports(src) & ZIP_SOURCE_MAKE_COMMAND_BITMASK(ZIP_SOURCE_SUPPORTS_REOPEN)) == 0) {
+ zs->supports &= ~ZIP_SOURCE_MAKE_COMMAND_BITMASK(ZIP_SOURCE_SUPPORTS_REOPEN);
+ }
return zs;
}
diff --git a/lib/zip_source_supports.c b/lib/zip_source_supports.c
index 66a5303c..8ccc41bc 100644
--- a/lib/zip_source_supports.c
+++ b/lib/zip_source_supports.c
@@ -63,3 +63,8 @@ zip_source_make_command_bitmap(zip_source_cmd_t cmd0, ...) {
return bitmap;
}
+
+
+bool zip_supports_multiple_open(zip_source_t *src) {
+ return (zip_source_supports(src) & ZIP_SOURCE_MAKE_COMMAND_BITMASK(ZIP_SOURCE_SUPPORTS_REOPEN)) != 0;
+}Also, we don't need to make another call that only ever returns 1. If a source says it supports reopen, then no further checks are needed. With that, all that's needed in each source is to add Please do not duplicate the main part of Also, please don't add a special case to On a stylistic note, please remove spaces before opening parentheses in function calls ( |
This is certainly a better name. Will use it, thanks.
This is a nice simplification, thanks. The only thing I thought is that the source could decide during
Will try it out.
Oops, sorry about that. My personal style got in. Will fix it. |
We need a promise from the source that it will allow a second open, otherwise |
d7f0cdd to
0409796
Compare
I also added the case
In the end I extended the
I replaced that with ziptool, but needed to add the special "{add,replace}_reopenable" command to ziptool-regress, as I haven't added support for reopening to any of the builtin sources yet. I'm not sure how to do it yet (either add TODO:
|
That's not a safe assumption, see my review comment.
Yes, that approach works fine.
Just advertise support for reopening to sources that can handle it. That should be most of the built in sources.
That's a more complicated test than necessary. For now, just add a file and read it with zip_tool cat. |
Ok, will do - at first I didn't want to change the built in sources (I dunno, someone relies on getting
Yeah, that's already done I'd say. With this TODO item, I wanted to test the code path where we do a partial read of the changed data and this seems to be the only way. |
Just realized that I could do it with just one archive. |
To test partial reads, we should add a command to |
0409796 to
c97e6d1
Compare
|
Updated, still need to write some docs.
So my idea was to have more or less something like this: But that will open testbuffer.zip again, where the modification from
|
Or by |
c97e6d1 to
eda1834
Compare
|
Implemented |
a978b74 to
773e4aa
Compare
|
For now I can say that I have no idea why the tests fail on appveyor. At first I thought it's because of enabling reopen in I also did a small stab at documentation, but I'll admit that the mdoc format is pretty much alien to me. |
773e4aa to
7cf79d8
Compare
|
Updated, the issue with windows failures and hopefully with the fuzzer is resolved. |
|
Sorry, I've been busy lately, but I'll have a look at it next week. Thank you for your continued work on this issue. |
lib/zip_source_zip_new.c
Outdated
| src = srcza->entry[srcidx].source; | ||
| zip_source_keep(src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the branch a bit more and there was one issue - having two zip_file_t to the same reopened source messes up reads - reading from one zip_file_t affects reading from the another. It normally works for unchanged sources, because window source does seeks before calling read on the underlying source. I think here we will need to wrap the reopened source into window source here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably address this in zip_source_read: keep track of how many opens we have, and use seek if there are more than one. And disallow a second open if the source doesn't support seeking.
This is an existing problem, so we can address this after the merge.
9072958 to
4e78011
Compare
lib/zip_fclose.c
Outdated
|
|
||
| if (zf->src) | ||
| if (zf->src) { | ||
| zip_source_close(zf->src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zip_source_free does a close, so this call is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can revert this change. I'll only point you to the commit message in 4a97e13. Now this probably won't be a problem any more, because we always wrap the bottom-most source in some other sources, which happen to have ref count of 1, so they'll be closed on free.
I'd still say that zip_source_free is a misnomer (should be rather something like zip_source_unref), because it's freeing things only if ref count drops to 0. So if refcount is still higher, no free will happen and no closing either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we need to keep track which references called open the source. We will split the source struct in two parts, the actual source and a reference. But we can do this after the merge.
| .Dv ZIP_SOURCE_SUPPORTS_REOPEN | ||
| will allow calling | ||
| .Fn zip_fopen | ||
| without the ZIP_FL_UNCHANGED flag to read the modified data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the zip_source level, it means that you can read the data multiple times. If you want to document what it means for zip_fopen, document it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand this. You could read the data multiple times without this PR - it was possible because window source takes care of repositioning the underlying source by seeking it to the desired offset before doing a read. The catch was that it was only possible for unmodified data, and since unmodified data is backed by a source that implements seeking, reading data multiple times always worked.
Currently none of the sources provided by libzip supports reopening, but it is possible to write our own that does by using zip_source_function.
The only source that is not supporting reopening is the file-backed source. It works on Linux, but on some Windows platforms there are failures I can't debug.
We won't get the ZIP_ER_CHANGED errors, because built in sources support reopening.
When cat in ziptool got implemented in terms of sources instead of files, reading encrypted data broke. Not using the default password stored in the archive was the culprit.
We will implement a test for intertwined reads from multiple opened files backed by the same source in terms of those functions.
4e78011 to
b13daae
Compare
|
Updated. The only review issue left to address is the one about documentation, but I had some doubts about that as commented above. Other unresolved issues are solved, but had a comment or question about it anyway. Implementing |
lib/zip_source_zip_new.c
Outdated
| src = srcza->entry[srcidx].source; | ||
| zip_source_keep(src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably address this in zip_source_read: keep track of how many opens we have, and use seek if there are more than one. And disallow a second open if the source doesn't support seeking.
This is an existing problem, so we can address this after the merge.
lib/zip_fclose.c
Outdated
|
|
||
| if (zf->src) | ||
| if (zf->src) { | ||
| zip_source_close(zf->src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we need to keep track which references called open the source. We will split the source struct in two parts, the actual source and a reference. But we can do this after the merge.
| password = NULL; | ||
| } | ||
|
|
||
| return _zip_source_zip_new(srcza, srcidx, flags, start, (zip_uint64_t)len, password, error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zip_set_default_password already turns an empty password into NULL, so the check here is redundant. Just pass srcza->default_password to _zip_source_zip_new.
|
Sorry for letting this sit for so long. We finally merged it. You did excellent work, thank you. |
…10.0 In libzip versions prior to 1.10.0, you couldn't do a read for a changed file: this would result in an error. This is tested in our .phpt test. Starting from version 1.10.0 [1] this is no longer an error. Therefore, we amend the test to check the behaviour depending on which version is used. [1] nih-at/libzip#286
Hi,
I'd like to show you some PoC code that allows reading contents of the changed files in the archive (that is - doing
zip_fopenwithout theZIP_FL_UNCHANGEDflag). This is implemented as an opt-in feature ofzip_source- it needs to say that it supports theZIP_SOURCE_ALLOW_CHANGED_READcommand, and that command should return 1 if it's actually allowed. In such case,zip_freadwill use the source to read data from it, instead of returningZIP_ER_CHANGED. The feature should be backwards compatible, as no builtin non-layered sources support this command, and the builtin layered sources (encryption, window and crc) are deferring to the underlying source.My usecase was basically to have the zip archive to serve as a backend (next to the "directory and files" backend) for a stream source, which can provide input or output streams. This change would allow me to open an output stream, write some data to it, and then close it; later I could open the same stream, but as an input and read from it.
Would such a feature be considered for inclusion to the project?