-
Notifications
You must be signed in to change notification settings - Fork 39
Add safetensors reader #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nenb this looks great.
Thank you for using the new ManifestStore
abstraction too (despite us not having documented or released it yet 😅) - that will save us time refactoring later.
My comments are just nits, seems like we're waiting on zarr dtype extensions to support everything we want here?
except ImportError: | ||
raise ImportError("The ml_dtypes package is required to read safetensors files. Please install it with pip install virtualizarr[safetensors].") | ||
|
||
dtype_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just make this a global variable at the top of the file for visibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave it as is for now until the Zarr dtype PR is released, and then will chat with you about how best to implement this function.
e472167
to
b08fc84
Compare
b08fc84
to
6d37b17
Compare
for more information, see https://pre-commit.ci
zarr-python
.Summary
This PR is the first step in #367. It adds a reader for the official Safetensors specification (see specs). Note that HuggingFace currently implements a SafeTensor reader that goes considerably beyond the official specification. Support for this implementation (which in practice the entire ML/AI community relies on) will be added in the next step for #367.
Notes for reviewers
This PR was developed in association with an AI coding assistant. Make of that what you will!
I have added a notebook to show how everything works. This can be removed in the final version. I have a question at the end of the notebook related to partial get support, which I would appreciate a core dev looking into.
Checklist
docs/releases.rst
api.rst