-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI to upload arbitrary huge folder #2254
Conversation
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Lysandre Debut <hi@lysand.re>
@FurkanGozukara you an use |
raise ValueError( | ||
"For large uploads, `repo_type` is explicitly required. Please set it to `model`, `dataset` or `space`." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback while using it @Wauplin: the error says:
ValueError: For large uploads, `repo_type` is explicitly required. Please set it to `model`, `dataset` or `space`.
but the expected argument is repo-type
, as otherwise you get:
huggingface-cli: error: unrecognized arguments: --repo_type=model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in cdfb27f. Thanks for the feedback!
it says uploads completed run triple times but there are no files in repo
|
@FurkanGozukara yes you can do that by passing |
Have you tried to reupload the same folder to multiple locations? If yes, only the first upload will be correct. As mentioned in the little help section:
I suspect that your local metadata says the files are already uploaded. You can delete it and rerun the command. |
Thanks everyone involved in this PR! Feedback from everyone has been immensely valuable to shape this feature. I hope it'll now benefit to as much users as possible! 🫶 Time to merge! |
it appeared later for some reason |
@Wauplin can we set sub folder path right now I tried like this for subfolder and it failed
|
Hi @FurkanGozukara, no this is currently not possible. See known limitations in the PR description:
|
I saw it ty. How do I set proper structure locally? I tried like above screenshot it failed :/ |
I don't understand the error to be honest. The message says the provided path must be a folder so I guess it's a problem with the input parameters. Could you try to provide it with a python script instead of calling it from the CLI? That would help narrow down the problem. from huggingface_hub import HfApi
api = HfApi()
api.upload_large_folder(repo_id=..., repo_type=..., ...) |
This errors happens only if token is not valid. Doesn't seem related to tool itself. |
thank you so much this new upload is amazing |
@Wauplin the new upload works amazing but i get this warning / error - lots of times - uploading around 70 gb is this expected? files are uploaded to repo successfully |
This is not expected no but I supposed it has to do with how jupyter notebooks handle logs. Nothing much to worry about. |
@Wauplin i have been rate-limited for the first time could this be related to new method? None of the models uploaded fully, i trust resume capability at the moment :D |
@Wauplin i waited several hours restarted process, and definitely it is hitting api limit when verifying which files were accurately uploaded i don't know if can be solved or not . i have lots of small files just letting you know |
Hi @FurkanGozukara sorry for the inconvenience. How many files are we talking about and which size for each of them? And which file extension ? Also, are the files uploaded as regular or LFS files? This info would help knowing use cases that are not handled perfectly |
10581 files Around 44 files are big like 6 7 gb rest are images like 1 - 2 mb Let me give you exact numbers via python scan 1 minute |
@Wauplin
|
Thanks for the details! I don't have the bandwidth to check that now but it can definitely prove useful at some point. Can I ask you to open a new issue dedicated to it? Describing your problem when getting rate limited with this structure of repo. Thanks in advance! |
What for?
Upload arbitrarily large folders in a single command line!
How to use it?
Install
EDIT: PR has been merged so installation can be done from the
main
branch.Upload folder
Every minute a report is printed to the terminal with the current status. In addition to that, progress bars and errors are still displayed.
Run
huggingface-cli large-upload --help
to see all options.PR documentation:
What does it do?
This CLI is intended to upload arbitrary large folders in a single command:
A
.cache/huggingface/
folder will be created at the root of your folder to keep track of the progress. Please do not modify these files manually. If you feel this folder got corrupted, please report it here, delete the.huggingface/
entirely and then restart you command. Some intermediate steps will be lost but the upload process should be able to continue correctly.Known limitations
path_in_repo
=> always upload files at root of the folder. If you want to upload to a subfolder, you need to set the proper structure locally.revision
These limitations are documented.