Skip to content

Conversation

@pmady
Copy link

@pmady pmady commented Feb 7, 2026

  • Feature

What does this PR do?

This PR adds support for downloading files from Hugging Face Hub repositories using the `hf://` URL scheme, addressing issue dragonflyoss/dragonfly#4419.

Features

  • New `hf://` URL scheme: Download models, datasets, and spaces from Hugging Face Hub
  • URL format: `hf://[repo_type/]/[/][@]`
  • Authentication: Support via `--hf-token` flag or `HF_TOKEN` environment variable
  • Git LFS support: Handles large model files through Hugging Face HTTP API
  • Repository listing: Supports recursive downloads with `-r` flag

Usage Examples

# Download a single file
dfget hf://deepseek-ai/DeepSeek-OCR/model.safetensors -O /tmp/model.safetensors

# Download entire repository
dfget hf://deepseek-ai/DeepSeek-OCR -O /tmp/DeepSeek-OCR/ -r

# With authentication for private repos
dfget hf://owner/private-repo/model.bin -O /tmp/model.bin --hf-token=<token>

Changes

  1. `dragonfly-client-backend/src/huggingface.rs` (new): Hugging Face backend implementation
  2. `dragonfly-client-backend/src/lib.rs`: Register `hf` backend in BackendFactory
  3. `dragonfly-client-backend/Cargo.toml`: Add serde dependencies
  4. `dragonfly-client/src/bin/dfget/main.rs`: Add `--hf-token` CLI argument and examples

Related Issues

Closes dragonflyoss/dragonfly#4419

Checklist

  • Code follows project style guidelines
  • Tests added/updated (46 tests pass)
  • Documentation updated (CLI help)
  • Commits are signed off"

pmady added 4 commits February 7, 2026 16:42
Implement a new backend for downloading files from Hugging Face Hub
repositories using the hf:// URL scheme.

Features:
- Support for models, datasets, and spaces repositories
- URL parsing with revision/branch support (e.g., hf://owner/repo@v1.0)
- Authentication via HF_TOKEN environment variable
- Git LFS file support for large model files
- Repository listing for recursive downloads

Signed-off-by: pmady <pmady@users.noreply.github.com>
Register the hf:// scheme backend in load_builtin_backends() and update
tests to include the new backend in expected backends list.

Signed-off-by: pmady <pmady@users.noreply.github.com>
Add serde and serde_json workspace dependencies required for parsing
Hugging Face API responses.

Signed-off-by: pmady <pmady@users.noreply.github.com>
Add --hf-token argument for Hugging Face authentication and include
usage examples in the CLI help documentation.

Examples added:
- Download single file: dfget hf://owner/repo/path -O /tmp/file
- Download repository: dfget hf://owner/repo -O /tmp/repo/ -r
- With authentication: dfget hf://... --hf-token=<token>

Signed-off-by: pmady <pmady@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Feb 7, 2026

Codecov Report

❌ Patch coverage is 49.59016% with 246 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.42%. Comparing base (035ae91) to head (49c7fa0).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
dragonfly-client-backend/src/huggingface.rs 49.36% 240 Missing ⚠️
dragonfly-client/src/bin/dfget/main.rs 25.00% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1665      +/-   ##
==========================================
- Coverage   50.83%   50.42%   -0.42%     
==========================================
  Files          83       84       +1     
  Lines       20029    20642     +613     
==========================================
+ Hits        10182    10408     +226     
- Misses       9847    10234     +387     
Files with missing lines Coverage Δ
dragonfly-client-backend/src/lib.rs 96.09% <100.00%> (+0.04%) ⬆️
dragonfly-client/src/bin/dfget/main.rs 49.84% <25.00%> (-0.16%) ⬇️
dragonfly-client-backend/src/huggingface.rs 49.36% <49.36%> (ø)

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Apply cargo fmt to fix formatting in huggingface.rs and lib.rs.

Signed-off-by: pmady <pmady@users.noreply.github.com>
@pmady
Copy link
Author

pmady commented Feb 7, 2026

Hi maintainers, could you please add the enhancement label to this PR? The PR Label check requires one of: bug, enhancement, documentation, or dependencies. Thank you!

@gaius-qi gaius-qi added the enhancement New feature or request label Feb 9, 2026
@gaius-qi
Copy link
Member

gaius-qi commented Feb 9, 2026

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new hf:// backend so dfdaemon/dfget can download from Hugging Face Hub repositories (models/datasets/spaces), including repo listing for recursive downloads, and introduces a CLI flag intended for HF authentication.

Changes:

  • Add huggingface backend implementation and register scheme hf in BackendFactory.
  • Extend dfget CLI help/examples and add --hf-token argument.
  • Add serde/serde_json deps for HF API response parsing.

Reviewed changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
dragonfly-client/src/bin/dfget/main.rs Adds HF usage examples and --hf-token CLI option.
dragonfly-client-backend/src/lib.rs Registers the new hf backend and updates backend factory tests.
dragonfly-client-backend/src/huggingface.rs Implements HF backend: URL parsing, stat/list/get/exists, plus unit tests.
dragonfly-client-backend/Cargo.toml Adds serde dependencies needed by the new backend.
Cargo.lock Locks new transitive deps from serde additions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pmady added 3 commits February 9, 2026 10:24
- Update copyright year to 2026
- Remove environment variable fallback for HF_TOKEN, keep only CLI option
- Implement ParsedHfUrl with TryFrom<Url> and TryFrom<&str> traits
- Make ParsedHfUrl and RepoType public structs
- Update tests to use TryFrom pattern

Signed-off-by: pmady <pmady@users.noreply.github.com>
The HF backend was instantiated with HuggingFace::new() at startup,
making the --hf-token CLI flag ineffective since the token was stored
on the struct but never received from dfget.

Changes:
- dfget: inject --hf-token as Authorization header into request_header
  so it flows through gRPC to dfdaemon and into the backend
- HF backend: remove stored token field, read auth from request
  http_header instead via build_headers() method
- Remove new_with_token() constructor since it is no longer needed

Signed-off-by: pmady <pmady@users.noreply.github.com>
- Fix URL parsing: remove redundant early-return branch, always require
  owner/repo (two segments) after optional type prefix
- Fix list_files to return hf:// URLs instead of https:// so downstream
  downloads continue using the HF backend (preserving auth and semantics)
- Use versioned DEFAULT_USER_AGENT matching the HTTP backend pattern
  (concat!("dragonfly", "/", env!("CARGO_PKG_VERSION"))) and allow
  user-supplied User-Agent to override it
- Fix dataset test to use proper owner/repo URL format
- Add comprehensive test coverage: dataset, space, explicit model type,
  invalid scheme, missing repo, build_hf_url, build_headers behavior

Signed-off-by: pmady <pmady@users.noreply.github.com>
@pmady
Copy link
Author

pmady commented Feb 9, 2026

@gaius-qi I've created a documentation PR at dragonflyoss/d7y.io#386 that adds:

  • Hugging Face integration page: New section documenting the native hf:// protocol with URL format, single file download, authentication (--hf-token), recursive repository download, dataset download, and revision-specific download examples.
  • dfget reference page: Added "Download with Hugging Face protocol" section with complete examples.

@pmady pmady requested a review from gaius-qi February 9, 2026 18:13
@gaius-qi
Copy link
Member

@pmady Thanks, I'll finish the review by this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Supports directly pulling repositories from Hugging Face

2 participants