Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-opening ZipStore with a new mode #2448

Open
d-v-b opened this issue Oct 30, 2024 · 5 comments
Open

re-opening ZipStore with a new mode #2448

d-v-b opened this issue Oct 30, 2024 · 5 comments

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Oct 30, 2024

In main it's not possible to make a read-only copy of a ZipStore instance. Is there any particular reason why this is not allowed, or is it something we could add? Naively, it seems like a read-only copy of any store class should always be safe to generate.

@TomAugspurger
Copy link
Contributor

IIRC, ZipStore has some instance level state that makes it hard to re-open. The file needs to be closed and the ZIP footer written before anyone else can open the file (going off memory, so I might have some details wrong).

@d-v-b
Copy link
Contributor Author

d-v-b commented Oct 30, 2024

that means that ZipStore always errors if it reaches this conditional in make_store_path:

elif isinstance(store_like, Store):
if mode is not None and mode != store_like.mode.str:
store_like = store_like.with_mode(mode)
await store_like._ensure_open()

If we want to keep these store semantics (I'm not sure that we do...) we should add a zipstore-specific exception here that instructs users how to open their zipstored data with a different mode. Otherwise, invoking something like open(store=store, path="doesnotexist", mode="r", zarr_format=zarr_format) raises NotImplementedError with no guidance.

@d-v-b
Copy link
Contributor Author

d-v-b commented Oct 30, 2024

would it be terrible if ZipStore.with_mode(self, mode) called self.close(), effectively invalidating the original store instance?

@TomAugspurger
Copy link
Contributor

we should add a zipstore-specific exception here that instructs users how to open their zipstored data with a different mode

I wonder how you'd do that... Looking more closely, I see my earlier comment about ZipStore sharing some state wasn't 100% right. Really all we share is a Path, we don't try to share a particular instance of zipfile.ZipFile or any locks. We just need to ensure that there's a valid Zip file at that path before we try to open it, which leads to...

would it be terrible if ZipStore.with_mode(self, mode) called self.close(), effectively invalidating the original store instance?

Is it possible to have both, by having our write / read calls automatically call open if needed? If so, then maybe with_mode could call close (which writes the footer, making the ZipFile readable), without breaking the original store since any subsequent writes would automatically re-open the ZipFile.

I'm not sure what the downsides of this kind of just-in-time reopening would be.

@d-v-b
Copy link
Contributor Author

d-v-b commented Oct 30, 2024

upon calling _sync_open, ZipStore instances acquire a lock and a file handle:

self._lock = threading.RLock()
self._zf = zipfile.ZipFile(
self.path,
mode=self._zmode,
compression=self.compression,
allowZip64=self.allowZip64,
)
, so I think it's about as stateful as you can get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants