-
Notifications
You must be signed in to change notification settings - Fork 57
Feat/huggingface backend #1665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/huggingface backend #1665
Conversation
Implement a new backend for downloading files from Hugging Face Hub repositories using the hf:// URL scheme. Features: - Support for models, datasets, and spaces repositories - URL parsing with revision/branch support (e.g., hf://owner/repo@v1.0) - Authentication via HF_TOKEN environment variable - Git LFS file support for large model files - Repository listing for recursive downloads Signed-off-by: pmady <pmady@users.noreply.github.com>
Register the hf:// scheme backend in load_builtin_backends() and update tests to include the new backend in expected backends list. Signed-off-by: pmady <pmady@users.noreply.github.com>
Add serde and serde_json workspace dependencies required for parsing Hugging Face API responses. Signed-off-by: pmady <pmady@users.noreply.github.com>
Add --hf-token argument for Hugging Face authentication and include usage examples in the CLI help documentation. Examples added: - Download single file: dfget hf://owner/repo/path -O /tmp/file - Download repository: dfget hf://owner/repo -O /tmp/repo/ -r - With authentication: dfget hf://... --hf-token=<token> Signed-off-by: pmady <pmady@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1665 +/- ##
==========================================
- Coverage 50.83% 50.42% -0.42%
==========================================
Files 83 84 +1
Lines 20029 20642 +613
==========================================
+ Hits 10182 10408 +226
- Misses 9847 10234 +387
🚀 New features to boost your workflow:
|
Apply cargo fmt to fix formatting in huggingface.rs and lib.rs. Signed-off-by: pmady <pmady@users.noreply.github.com>
|
Hi maintainers, could you please add the |
|
@pmady Thanks for contributing this PR. Can you add document for https://d7y.io/docs/next/operations/integrations/hugging-face/ and https://d7y.io/docs/next/reference/commands/client/dfget/#download-with-different-protocols. d7y.io Repo: https://github.com/dragonflyoss/d7y.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new hf:// backend so dfdaemon/dfget can download from Hugging Face Hub repositories (models/datasets/spaces), including repo listing for recursive downloads, and introduces a CLI flag intended for HF authentication.
Changes:
- Add
huggingfacebackend implementation and register schemehfinBackendFactory. - Extend
dfgetCLI help/examples and add--hf-tokenargument. - Add
serde/serde_jsondeps for HF API response parsing.
Reviewed changes
Copilot reviewed 3 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
dragonfly-client/src/bin/dfget/main.rs |
Adds HF usage examples and --hf-token CLI option. |
dragonfly-client-backend/src/lib.rs |
Registers the new hf backend and updates backend factory tests. |
dragonfly-client-backend/src/huggingface.rs |
Implements HF backend: URL parsing, stat/list/get/exists, plus unit tests. |
dragonfly-client-backend/Cargo.toml |
Adds serde dependencies needed by the new backend. |
Cargo.lock |
Locks new transitive deps from serde additions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Update copyright year to 2026 - Remove environment variable fallback for HF_TOKEN, keep only CLI option - Implement ParsedHfUrl with TryFrom<Url> and TryFrom<&str> traits - Make ParsedHfUrl and RepoType public structs - Update tests to use TryFrom pattern Signed-off-by: pmady <pmady@users.noreply.github.com>
The HF backend was instantiated with HuggingFace::new() at startup, making the --hf-token CLI flag ineffective since the token was stored on the struct but never received from dfget. Changes: - dfget: inject --hf-token as Authorization header into request_header so it flows through gRPC to dfdaemon and into the backend - HF backend: remove stored token field, read auth from request http_header instead via build_headers() method - Remove new_with_token() constructor since it is no longer needed Signed-off-by: pmady <pmady@users.noreply.github.com>
- Fix URL parsing: remove redundant early-return branch, always require
owner/repo (two segments) after optional type prefix
- Fix list_files to return hf:// URLs instead of https:// so downstream
downloads continue using the HF backend (preserving auth and semantics)
- Use versioned DEFAULT_USER_AGENT matching the HTTP backend pattern
(concat!("dragonfly", "/", env!("CARGO_PKG_VERSION"))) and allow
user-supplied User-Agent to override it
- Fix dataset test to use proper owner/repo URL format
- Add comprehensive test coverage: dataset, space, explicit model type,
invalid scheme, missing repo, build_hf_url, build_headers behavior
Signed-off-by: pmady <pmady@users.noreply.github.com>
|
@gaius-qi I've created a documentation PR at dragonflyoss/d7y.io#386 that adds:
|
|
@pmady Thanks, I'll finish the review by this week. |
What does this PR do?
This PR adds support for downloading files from Hugging Face Hub repositories using the `hf://` URL scheme, addressing issue dragonflyoss/dragonfly#4419.
Features
Usage Examples
Changes
Related Issues
Closes dragonflyoss/dragonfly#4419
Checklist