Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threading meta-data lookups #575

Closed
nbren12 opened this issue Jul 8, 2020 · 12 comments
Closed

Multi-threading meta-data lookups #575

nbren12 opened this issue Jul 8, 2020 · 12 comments

Comments

@nbren12
Copy link

nbren12 commented Jul 8, 2020

Coming from the xarray world, one of the main difficulties with working with zarr in the cloud is the slow read-times of the metadata. consolidate_metadata is a good solution, but it does take a long time to run. It would be straightforward to add some threading here:

for key in store if is_zarr_key(key)

It might also be nice to use this threaded metadata look-up without mutating the underlying store, which is problematic for clients without write privileges.

@nbren12
Copy link
Author

nbren12 commented Jul 8, 2020

cc @rabernat

@nbren12
Copy link
Author

nbren12 commented Jul 8, 2020

It seems this problem could be solved independently of the multiget discussion happening over in #536.

@nbren12
Copy link
Author

nbren12 commented Jul 8, 2020

Actually, the bigger problem is generating the list of all the keys which can be slow if there are a lot of keys. Is there a faster way to generate the list of just the metadata keys? This is pretty easy to do for a file-system or using fsspec primitives.

@alimanfoo
Copy link
Member

Hi @nbren12, this is one of the issues we're trying to address in the v3 spec, where you can list all keys with prefix "meta/" to get just the metadata keys. Many stores have a fast native implementation of listing keys with a given prefix, so this should generally perform well.

For v2 implementations this is harder to achieve. Only option I'm aware of is to use separate metadata and chunk stores. I.e., you could do something like this with the current library:

meta_store = fsspec.get_mapper('gs://my-bucket/meta/')
chunk_store = fsspec.get_mapper('gs://my-bucket/data/')
root = zarr.open(store=meta_store, chunk_store=chunk_store)

@nbren12
Copy link
Author

nbren12 commented Jul 9, 2020

Thanks for the reply. It's good to know you are working on a solution for v3. For a flat zarr group, it is very fast to look up all the metadata files provided you can use a filesystem abstraction. E.g.

vars = fs.ls('gs://my-bucket/meta')

potential_meta_files = []
for var in vars:
    for metafile in ['.zattrs', ...]:
        potential_meta_files.append(f"{var}/{metafile}")

Use this trick + multi-threading I was able to consolidate the metadata of a zarr with millions of chunks in about 1 second.

where you can list all keys with prefix "meta/" to get just the metadata keys.

Separating the metadata and data in this way does seem like a simple approach. However, the ZarrStore abstraction seems to be in a somewhat awkward middle ground between a Mapping and a tree-like filesystem. On the one-hand, they are a simple place to put data, on the other the store is expected to do fast look ups of keys in a variety of different storage systems.

Perhaps it would make sense to formalize the concept of an "Index" (which could have a tree-like structure) which would manage keys and provide fast look-ups for groups of keys (e.g. metadata). This would also decouple the naming of the chunks in object storage from how they are interpreted by zarr, which can have performance implications. (I remember reading someplace that naming objects with random uids improves performance in object storage).

Just some food for thought.

@alimanfoo
Copy link
Member

Use this trick + multi-threading I was able to consolidate the metadata of a zarr with millions of chunks in about 1 second.

Thanks, very good to know this works!

where you can list all keys with prefix "meta/" to get just the metadata keys.

Separating the metadata and data in this way does seem like a simple approach. However, the ZarrStore abstraction seems to be in a somewhat awkward middle ground between a Mapping and a tree-like filesystem. On the one-hand, they are a simple place to put data, on the other the store is expected to do fast look ups of keys in a variety of different storage systems.

FWIW this is the same middle ground where object stores like GCS, S3, etc. are. IIUC they are effectively a flat mapping of keys (object names) to values (objects), but they do provide efficient functions for querying keys "hierarchically" based on a prefix and a delimiter. It seems reasonable to assume that a range of different store types could offer this same functionality.

I.e., stores don't need to support the full file-system abstraction, they just need the ability to support the mapping abstraction plus querying keys based on prefix.

Perhaps it would make sense to formalize the concept of an "Index" (which could have a tree-like structure) which would manage keys and provide fast look-ups for groups of keys (e.g. metadata). This would also decouple the naming of the chunks in object storage from how they are interpreted by zarr, which can have performance implications. (I remember reading someplace that naming objects with random uids improves performance in object storage).

FWIW I think there are two separate questions here. One is how to get fast listing of metadata keys. The other is how to get best I/O performance for reading and writing chunks.

WRT the first question, store types like cloud object stores, local file systems and key-value databases should all natively support efficient key lookups based on a prefix query, so I don't think decoupling of keys is necessary. I.e., those stores already implement their own indexing internally.

WRT the second question, I have also heard that I/O on cloud object stores is distributed based on the object name and so objects likely to be accessed concurrently shouldn't have similar names, but haven't seen or heard any benchmarking showing that makes a difference in practice. If that does turn out to be true, I think protocol extensions could be designed that decouple zarr keys from storage locations. E.g., zarr-developers/zarr-specs#82.

Just some food for thought.

Much appreciated 😄

@martindurant
Copy link
Member

Note that fsspec-based stores are starting to support concurrent fetches of multiple keys (i.e., .cat(list_of_files) on the filesystem, or .getitems(list_of_keys) on the mapper). When the backend doesn't support it, these fall back to a sequential approach.

@nbren12
Copy link
Author

nbren12 commented Oct 29, 2020

The main bottleneck here was listing the keys (which is quite costly for millions of chunks). If the fsspec-based stores supported a listdir method, then we could probably implement the traversal approach above to dramatically speed up metadata lookups.

@martindurant
Copy link
Member

The FSStore in zarr V2 does support listdir; but using consolidated stores (all the metadata in a single file) would still be better, avoiding any listing at all.

A clarification on the behaviour of object stores:

  • yes, listing by prefix should be efficient (i.e., in the v3 model), and this would be equivalent to fsspec's find method (or ls, if you only want one level deep - this is the one exposed by FSStore)
  • it's true that storage location in S3 (and probably the others too) is dependent on the key prefix, but I wouldn't worry about throughput limits, since the metadata files are all small - and the requests-per-second limits are very high. It's more of an issue for the larger data files, which in the hierarchical approach do always get the same prefix
  • the multi-threading approach via the requests library doesn't work well for reading keys, because it is blocking while reading a response body; if this were not so, I would (gratefully) not have has to implement async in fsspec. Async does work great for small files over SSL, though, since the vast majority of the time for a single request is waiting for the transfer to begin - you can easily launch >>1000 requests at once.

@nbren12
Copy link
Author

nbren12 commented Oct 29, 2020

Thanks. I fully agree that consolidated metadata is the way to go. I'm just proposing a faster way to consolidate metadata on existing stores.

Currently, the main performance problem is the for key in store part of the code linked above. Listing all the keys of store is much slower in this case than actually reading the few XXX/.zarray keys. Therefore, we could speed up zarr.consolidate_metadata using listdir to quickly find all the present keys that match {variable_name}/.zarray and {variable_name}/.zattrs.

The FSStore in zarr V2 does support listdir

This is good.

@martindurant
Copy link
Member

Listing all the keys of store is much slower in this case

OK, got it. Yes, you could do listdir to get the directory layout (without listing the bottom-level directories, which may contain many files), collect all the .z* files, and then fetch them all in a single call to cat.

@jhamman
Copy link
Member

jhamman commented Dec 7, 2023

Closing this now that we've implemented asynchronous metadata reading in the v3 dev branch.

@jhamman jhamman closed this as completed Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants