Skip to content

Implement Store.move #3021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

brokkoli71
Copy link
Member

@brokkoli71 brokkoli71 commented Apr 25, 2025

This PR implements a move method for stores. This only affects stores that contain a path. afaics thats:

  • LocalStore
  • ZipStore
  • (maybe FsspecStore)

Am I missing one?

Maybe, this PR could also implement moving the content from one store to a new one of a different type e.g.

store = LocalStore(root="path")
store.move(dest=MemoryStore())
Feedback/thoughts welcome

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 25, 2025
@brokkoli71 brokkoli71 marked this pull request as draft April 25, 2025 14:44
@paraseba
Copy link
Contributor

I don't think this can be meaningfully implemented by any other Stores. Maybe it shouldn't be a method in the abstract class, but purely a concrete implementation for the Stores that can do it efficiently.

In a real world scenario, moving a cloud Store could take days, cost thousands of dollars and fail. People wouldn't want to run this kind of operation in this way.

@brokkoli71
Copy link
Member Author

good point! @paraseba
what do you think of store.move(dest=another_store) ?
for example from memory to local stores?

@d-v-b
Copy link
Contributor

d-v-b commented Apr 29, 2025

To deal with @paraseba's concerns, I feel like an batch IO operation that takes more than 30s should probably have some checkpointing / retry logic, and maybe it shouldn't even execute immediately (so that a user can inspect the transfer plan, run it later in the program, etc).

But we could ship a feature without that logic. Then we are basically forcing large-scale users to invent their own solutions to the inevitable case when the last file (out of 100000) fails to copy.

The upside is that the people copying small, local stores have something they can use sooner rather than later. I'm not sure which side of the tradeoff I prefer right now.

@paraseba
Copy link
Contributor

@brokkoli71

what do you think of store.move(dest=another_store) ?

That would be an extremely useful thing, I have wanted that to exist many times, but only if I can use it confidently across different stores. That would mean it needs checkpointing, retrying, restarting, progress feedback, etc. So it goes beyond what we should implement in a store. In particular, being part of the Store instance doesn't benefit it, because it won't have internal knowledge of the dest store anyway.

Regarding the name, I think move probably should be reserved for renaming members of the store, for example, moving an array from group A to group B, or changing the name of a group. That is an operation that some stores can run very efficiently, and that is also very useful to users. Anybody with a few real-world datasets wanted at least once to rename something.

I don't think we should implement move(dest: Store) at this point. I think that's a higher level tool. As @d-v-b , I'm also not sure what to say about the "internal" move you originally proposed. I'm pretty sure we shouldn't implement it for the cloud stores, because it would be misleading to users. And for memory stores it doesn't make sense. So, maybe it's better implemented initially as a concrete method on file system stores and then we see if people ask for it in other types of stores?

@brokkoli71
Copy link
Member Author

@paraseba

So, maybe it's better implemented initially as a concrete method on file system stores and then we see if people ask for it in other types of stores?

agreed!

@brokkoli71 brokkoli71 marked this pull request as ready for review May 2, 2025 16:14
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v3] Implement store methods for erase and rename operations
3 participants