Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safetensors support. #381

Merged
merged 6 commits into from
Mar 8, 2023
Merged

Conversation

Narsil
Copy link
Contributor

@Narsil Narsil commented Jan 20, 2023

I'm using this to do experiments feel free to close or ignore.
Main goal is to be able to use weights contained on https://hf.co
which doesn't include NPZ format.

All models supporting safetensors: https://huggingface.co/models?library=safetensors&sort=downloads

@Narsil Narsil marked this pull request as draft January 20, 2023 17:03
@coreylowman
Copy link
Owner

Great idea! This'll be great for getting pretrained weights since I had no idea how to load them from pickle files

@Narsil Narsil changed the title [WIP] Safetensors support. Safetensors support. Jan 20, 2023
@Narsil Narsil marked this pull request as ready for review January 20, 2023 19:15
@Narsil
Copy link
Contributor Author

Narsil commented Jan 20, 2023

Nice, I'll open this PR up then.

Biggest caveats I've seen:

There's a copy on write because safetensors expects a flat [u8] buffer. There seems to be a copy here: https://github.com/coreylowman/dfdx/pull/381/files#diff-a0868f528cbb396ebf73e5726654153f4ba44c8820003f936d809e8ae45e6cdaR35
Since every tensor have to be gathered before saving it is quite a strain. There is experimental support for saving iteratively huggingface/safetensors#134 (Like writing tensor per tensor in the file) but it has caveats.
If we were able to get &[u8] directly from Tensor then there wouldn't be a need for it (not sure how easy/likely).

It's usually not that bad on save, since it's not the biggest drawback, but still something to take into account when training large models.

In general I think we could aim for zero-copy loads with memory mapping on aligned files: https://github.com/coreylowman/dfdx/pull/381/files#diff-a0868f528cbb396ebf73e5726654153f4ba44c8820003f936d809e8ae45e6cdaR124
But not worth doing atm I think (That would mean having a borrowed buffer, which kills possibility of doing gradients stuff, just mentioning in case).

@coreylowman
Copy link
Owner

Yeah I think given the tensors may be stored on other devices, getting a &[u8] slice seems unlikely. E.g. for cuda we're going to have to copy back to CPU Vec<E> first anyway.

@Narsil
Copy link
Contributor Author

Narsil commented Jan 20, 2023

Yeah I think given the tensors may be stored on other devices, getting a &[u8] slice seems unlikely. E.g. for cuda we're going to have to copy back to CPU Vec first anyway.

Yes, and training is most likely going to occur on GPU most of the time. Still extra copies in general are usually becoming bottleneck really fast so I'm acutely aware of them.

Loading is happening much more than saving in practice (just like inference >> training overall) so as long as those are reduced to a minimum it's Ok.

@Narsil
Copy link
Contributor Author

Narsil commented Feb 16, 2023

Any new thoughts on this PR ? @coreylowman

Copy link
Owner

@coreylowman coreylowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. After #460 it will be straightforward to impl this for nn layer as well

src/tensor/safetensors.rs Outdated Show resolved Hide resolved
src/tensor/safetensors.rs Outdated Show resolved Hide resolved
src/tensor/safetensors.rs Show resolved Hide resolved
Copy link
Owner

@coreylowman coreylowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorCollection api was just merged, which greatly simplified nn::SaveToNpz/LoadFromNpz. Should be really easy to add a safetensors loader/saver for nn layer now as well!

@Narsil
Copy link
Contributor Author

Narsil commented Feb 24, 2023

Great addition !

I updated everything ! We can now simply load/save entire modules. Nice !

Cargo.toml Outdated
Comment on lines 36 to 37
# safetensors = { version = "0.2", default-features = false, optional = true }
safetensors = { git = "https://github.com/huggingface/safetensors", default-features = false, optional = true }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a holdover from developing? Which should be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to release 0.3 . Should happen this week. (I will release python release candidate first, there has been a relatively big change in how things are saved, everythng backward compatible, but files on disk will be different now, to get alignement properly done).

Copy link
Owner

@coreylowman coreylowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 will be good to merge after version fix in Cargo.toml

@Narsil Narsil force-pushed the safetensors_load_read branch 2 times, most recently from 512433a to 9b408dd Compare March 5, 2023 23:51
@Narsil Narsil requested a review from coreylowman March 6, 2023 09:06
@Narsil
Copy link
Contributor Author

Narsil commented Mar 6, 2023

@coreylowman I fixed with the released version.

@coreylowman
Copy link
Owner

@Narsil looks good. didn't this used to have an impl for tensor collections?

@Narsil
Copy link
Contributor Author

Narsil commented Mar 8, 2023

I'm as confused as you, I simply rebased. Doing the work again since I can't find that work anymore. I'm guessing I screwed my rebase but I can't figure out where the original code lives.

@Narsil
Copy link
Contributor Author

Narsil commented Mar 8, 2023

I added the test with the feature as I'm seeing a super weird trait issue in the Booleans module, which doesn't seem linked to my changes ? Any ideas: https://github.com/coreylowman/dfdx/actions/runs/4364586004/jobs/7632084283

Comment on lines +36 to +38
gpt2.weight
.load_safetensor(&tensors, "wte.weight")
.expect("Could not load tensor");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome example

Copy link
Owner

@coreylowman coreylowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Don't know why the boolean test is failing, its passing locally for me. Will resolve separately. Thanks for the contribution!

@coreylowman coreylowman merged commit 420a8a2 into coreylowman:main Mar 8, 2023
@Narsil Narsil deleted the safetensors_load_read branch March 8, 2023 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants