Skip to content

Conversation

@jwieleRH
Copy link
Collaborator

@jwieleRH jwieleRH commented Dec 8, 2025

Refactor the fetch_metadata method to attempt to fetch GGUF metadata from a manifest, and then to attempt to fetch safetensors metadata from the repo files.

To fetch a safetensors model, download the set of files from the repo with https requests. Add an "other_files" category for safetensors files that are neither JSON config files nor .safetensors model files, such as the tokenizer.model file.

Remove code to download models from huggingface.co with the huggingface cli. It didn't work correctly anyway. Stub out the _collect_cli_files and in_existing_cache methods for the huggingface transport.

Enable bats test of downloading safetensors models from huggingface. Remove bats test of downloading models with the huggingface cli.

Fixes: #1493

Summary by Sourcery

Support downloading Hugging Face safetensors and associated files over HTTPS while deprecating CLI-based downloads and extending repository metadata handling.

New Features:

  • Discover safetensors model, index, and auxiliary files from Hugging Face repositories using the Files API and download them via HTTPS.
  • Treat non-model, non-config artifacts (such as tokenizer and related files) as an explicit "other_files" category in snapshot generation.

Bug Fixes:

  • Remove broken support for downloading models via the Hugging Face CLI and stub related cache and collection pathways to avoid incorrect behavior.

Enhancements:

  • Fallback to safetensors-based metadata when a GGUF manifest is unavailable, and infer model type from file extensions when building snapshot entries.
  • Extend HF-style repository metadata and file list generation to track additional safetensor shards and index files, including appropriate checksum handling.

Tests:

  • Adjust system tests to cover safetensors downloads from Hugging Face and drop tests that relied on the Hugging Face CLI.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 8, 2025

Reviewer's Guide

Refactors the HuggingFace transport to prefer manifest-based GGUF metadata but fall back to HTTPS-based safetensors discovery and download via the HuggingFace Files API, while removing all reliance on the huggingface CLI and extending the HF-style repo base to support safetensors, auxiliary files, and a safetensors index file.

File-Level Changes

Change Details Files
Add HTTPS-based discovery of safetensors and auxiliary files via the HuggingFace Files API and integrate this metadata into the repository model.
  • Introduce fetch_repo_files() to call the HuggingFace Files API and return the repo file tree as JSON.
  • Implement HuggingfaceRepository._fetch_safetensors_metadata() to scan repo files for .safetensors, index JSON, and auxiliary files, recording them on the repository instance.
  • Extend HFStyleRepository.get_file_list() to add safetensors shards, other auxiliary files, and a safetensors index file to the download plan, with appropriate snapshot metadata.
ramalama/transports/huggingface.py
ramalama/hf_style_repo_base.py
Refactor metadata fetching to first use the GGUF manifest and then fall back to safetensors metadata, and generalize checksum fetching for arbitrary files.
  • Extract HuggingfaceRepository._fetch_manifest_metadata() to encapsulate manifest retrieval and GGUF field parsing with error handling and boolean success return.
  • Update HuggingfaceRepository.fetch_metadata() to chain _fetch_manifest_metadata() and _fetch_safetensors_metadata(), raising if neither succeeds.
  • Add _fetch_checksum_for_file() in the HF base class and override it in HuggingfaceRepository to use fetch_checksum_from_api for non-JSON files, wiring this into get_file_list() for safetensors and other files.
ramalama/transports/huggingface.py
ramalama/hf_style_repo_base.py
Enhance HF-style repository model handling so safetensors can be treated as primary model files with correct snapshot typing.
  • Add other_files, additional_safetensor_files, and safetensors_index_file fields to HFStyleRepository instances and initialize them in init.
  • Modify HFStyleRepository.model_file() to determine SnapshotFileType based on filename extension, treating .safetensors as SafetensorModel instead of GGUFModel.
  • Ensure safetensors-related files are created with appropriate SnapshotFileType and checksum behavior in get_file_list().
ramalama/hf_style_repo_base.py
Remove all HuggingFace CLI-based download, cache reuse, and file collection logic from the HuggingFace transport.
  • Delete is_hf_cli_available(), SnapshotFileType imports, and create_file_link usage from the Huggingface transport.
  • Remove the in_existing_cache() implementation that searched the local HF CLI cache and linked files, replacing it with a stub that always returns False.
  • Replace get_cli_download_args() with a NotImplementedError and stub out _collect_cli_files() to return no files.
ramalama/transports/huggingface.py
Update system tests to validate HTTPS-based safetensors pulls and drop tests depending on the huggingface CLI.
  • Enable the system test that pulls a safetensors model from HuggingFace using HTTPS by removing the skip-if-no-hf-cli guard.
  • Remove the hf cli cache-based pull test that pre-populated the HF cache and validated reuse through multiple transport syntaxes.
test/system/050-pull.bats

Assessment against linked issues

Issue Objective Addressed Explanation
#1493 Remove or disable the huggingface-cli fallback logic for huggingface/llama-server style URIs so it no longer attempts to download via the CLI or its cache.
#1493 Ensure that when pulling a llama-server style huggingface URI fails, the operation fails fast instead of falling back to a full model repo download via huggingface-cli.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jwieleRH, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the reliability and flexibility of model downloads by enabling direct HTTPS fetching for safetensors models from Hugging Face. It introduces a more robust metadata retrieval mechanism that gracefully handles both GGUF and safetensors model types, including support for sharded models and auxiliary files. The changes also remove the dependency on the problematic huggingface-cli for downloads, leading to a more stable and efficient process.

Highlights

  • Safetensors Model Download via HTTPS: The system can now directly download safetensors models from huggingface.co using HTTPS requests, addressing previous issues with the huggingface-cli.
  • Refactored Metadata Fetching: The fetch_metadata method has been refactored to first attempt fetching GGUF metadata from a manifest, and if unsuccessful, it then attempts to fetch safetensors metadata by listing repository files.
  • Support for Sharded Safetensors and Other Files: New categories (other_files and additional_safetensor_files) have been introduced to properly handle non-model safetensor files (e.g., tokenizers) and sharded safetensors, ensuring all necessary components are downloaded.
  • Removal of Hugging Face CLI Dependency: Code related to downloading models using the huggingface-cli has been removed or stubbed out, as it was not functioning correctly, streamlining the download process.
  • Updated System Tests: System tests have been updated to enable testing of safetensors model downloads from Hugging Face and to remove tests that relied on the deprecated huggingface-cli functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In fetch_repo_files, the /tree/{revision} API is queried only once with default parameters; if a model repo has many files or nested directories you may miss entries unless you either handle pagination or recurse over directories returned as type == 'directory'.
  • In _fetch_safetensors_metadata, you store oid for each file but then recompute checksums via fetch_checksum_from_api; consider either using oid directly where appropriate or clarifying why both are needed to avoid confusion and redundant API calls.
  • For safetensors_index_file in get_file_list, the hash is computed with generate_sha256(f"{org}/{name}/{file}"), which is not the file’s content hash; if this hash is used for integrity checks it should come from the same checksum mechanism as other files or explicitly disable checksum verification for the index file.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `fetch_repo_files`, the `/tree/{revision}` API is queried only once with default parameters; if a model repo has many files or nested directories you may miss entries unless you either handle pagination or recurse over directories returned as `type == 'directory'`.
- In `_fetch_safetensors_metadata`, you store `oid` for each file but then recompute checksums via `fetch_checksum_from_api`; consider either using `oid` directly where appropriate or clarifying why both are needed to avoid confusion and redundant API calls.
- For `safetensors_index_file` in `get_file_list`, the `hash` is computed with `generate_sha256(f"{org}/{name}/{file}")`, which is not the file’s content hash; if this `hash` is used for integrity checks it should come from the same checksum mechanism as other files or explicitly disable checksum verification for the index file.

## Individual Comments

### Comment 1
<location> `ramalama/transports/huggingface.py:79-88` </location>
<code_context>
         return json.loads(repo_manifest)


+def fetch_repo_files(repo_name: str, revision: str = "main"):
+    """Fetch the list of files in a HuggingFace repository using the Files API."""
+    token = huggingface_token()
+    api_url = f"https://huggingface.co/api/models/{repo_name}/tree/{revision}"
+    logger.debug(f"Fetching repo files from {api_url}")
+
+    request = urllib.request.Request(
+        url=api_url,
+        headers={
+            'Accept': 'application/json',
+        },
+    )
+    if token is not None:
+        request.add_header('Authorization', f"Bearer {token}")
+
+    with urllib.request.urlopen(request) as response:
+        files_data = response.read().decode('utf-8')
+        return json.loads(files_data)
+
+
</code_context>

<issue_to_address>
**issue:** The HuggingFace Files API call does not handle pagination, which can truncate results for large repos.

This endpoint paginates results, so this implementation will only ever see the first page and may miss `.safetensors` or other files in larger models. Please either support pagination (e.g., following any `cursor`/`next` fields) or clearly document that this helper is only safe for small repos and enforce that assumption in callers.
</issue_to_address>

### Comment 2
<location> `ramalama/transports/huggingface.py:125` </location>
<code_context>
+            # No ggufFile in manifest
+            return False
+
+    def _fetch_safetensors_metadata(self):
+        """Fetch metadata for safetensors models from HuggingFace API."""
+        repo_name = f"{self.organization}/{self.name}"
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying `_fetch_safetensors_metadata` and related helpers by using simpler data structures, an ignore-file helper, and a unified checksum hook to make the HuggingFace metadata flow easier to follow.

You can keep the new behavior but trim some of the new structure to make the code easier to follow.

### 1. Avoid dicts for `safetensors_files` / `other_files`

Right now `_fetch_safetensors_metadata` builds lists of dicts with `{'filename': ..., 'oid': ...}`, but only `filename` is actually used later (for sorting and choosing the main file). You can simplify this to lists of strings and keep `oid` out of the hot path until it’s really needed.

```python
def _fetch_safetensors_metadata(self):
    repo_name = f"{self.organization}/{self.name}"
    try:
        files = fetch_repo_files(repo_name)
    except (urllib.error.HTTPError, json.JSONDecodeError, KeyError) as e:
        logger.debug(f'fetch_repo_files failed: {e}')
        return False

    self.safetensors_files: list[str] = []
    self.other_files: list[str] = []
    index_file: str | None = None

    for file_info in files:
        if file_info.get('type') != 'file':
            continue

        path = file_info['path']
        logger.debug(f"Examining file {path}")

        if fnmatch.fnmatch(path, "*.safetensors"):
            self.safetensors_files.append(path)
        elif fnmatch.fnmatch(path, "*.safetensors.index.json"):
            index_file = path
        elif path in {".gitattributes", ".gitignore", "README.md"}:
            continue
        elif path in {
            self.FILE_NAME_CONFIG,
            self.FILE_NAME_GENERATION_CONFIG,
            self.FILE_NAME_TOKENIZER_CONFIG,
        }:
            continue
        elif fnmatch.fnmatch(path, "*.txt"):
            continue
        else:
            self.other_files.append(path)

    if not self.safetensors_files:
        logger.debug("No safetensors files found")
        return False

    # Sort by name for consistent ordering.
    self.safetensors_files.sort()

    self.model_filename = self.safetensors_files[0]
    repo_path = f"{self.organization}/{self.name}"
    self.model_hash = f"sha256:{fetch_checksum_from_api(repo_path, self.model_filename)}"

    self.additional_safetensor_files = self.safetensors_files[1:]
    self.safetensors_index_file = index_file

    return True
```

If at some point you really need `oid`, you can add a separate `self.file_oids: dict[str, str]` mapping keyed by filename instead of pushing `oid` into every list element.

### 2. Extract a small helper for “ignored” files

The sequence of `elif` branches for “ignore this file” is what makes the loop visually long. A small helper keeps the main logic easy to scan:

```python
def _is_ignored_file(self, path: str) -> bool:
    if path in {".gitattributes", ".gitignore", "README.md"}:
        return True
    if path in {
        self.FILE_NAME_CONFIG,
        self.FILE_NAME_GENERATION_CONFIG,
        self.FILE_NAME_TOKENIZER_CONFIG,
    }:
        return True
    if fnmatch.fnmatch(path, "*.txt"):
        return True
    return False
```

Then the loop becomes more linear:

```python
for file_info in files:
    if file_info.get('type') != 'file':
        continue

    path = file_info['path']
    logger.debug(f"Examining file {path}")

    if self._is_ignored_file(path):
        continue
    if fnmatch.fnmatch(path, "*.safetensors"):
        self.safetensors_files.append(path)
    elif fnmatch.fnmatch(path, "*.safetensors.index.json"):
        index_file = path
    else:
        self.other_files.append(path)
```

Same behavior, but the classification logic is easier to read and adjust.

### 3. Unify the checksum hook usage

You already have `_fetch_checksum_for_file` in the base `HFStyleRepository` (called from `get_file_list`) and another one here on the subclass. To lower mental overhead, you can concentrate all HF-specific checksum behavior in a single override, and let `fetch_metadata` reuse it.

For example, if the base class calls `_fetch_checksum_for_file(self, filename)`:

```python
class HuggingfaceRepository(HFStyleRepository):
    ...

    def _fetch_checksum_for_file(self, filename: str) -> str:
        # Skip JSON files during checksum fetch
        if fnmatch.fnmatch(filename, "*.json"):
            return ""
        repo_path = f"{self.organization}/{self.name}"
        return fetch_checksum_from_api(repo_path, filename)
```

Then in `_fetch_safetensors_metadata` you can rely on the same hook instead of calling `fetch_checksum_from_api` directly:

```python
self.model_hash = f"sha256:{self._fetch_checksum_for_file(self.model_filename)}"
```

That way, the base class remains the single place that “knows” when/where checksums are needed, and the subclass only defines how to get one, without duplicating extra rules in multiple methods.
</issue_to_address>

### Comment 3
<location> `ramalama/hf_style_repo_base.py:84` </location>
<code_context>
         self.model_hash = None
         self.mmproj_filename = None
         self.mmproj_hash = None
+        self.other_files: list[dict] = []
+        self.additional_safetensor_files: list[dict] = []
+        self.safetensors_index_file = None
</code_context>

<issue_to_address>
**issue (complexity):** Consider unifying extra file metadata and SnapshotFile construction so get_file_list becomes a simpler, declarative description of what to download rather than how to build each entry.

You can reduce the added complexity by (1) consolidating file state into a single structure and (2) centralizing `SnapshotFile` construction and checksum handling.

### 1. Consolidate `other_files` / `additional_safetensor_files` / `safetensors_index_file`

Instead of three separate attributes with slightly different semantics, use a single list of “extra” files with metadata. This keeps state in one place and makes `get_file_list` simpler:

```python
from dataclasses import dataclass
from enum import Enum

class ExtraFileKind(Enum):
    SAFETENSOR = "safetensor"
    OTHER = "other"
    INDEX = "index"

@dataclass
class ExtraFile:
    filename: str
    kind: ExtraFileKind
    required: bool = True
    should_verify_checksum: bool = True
    should_show_progress: bool = True

class BaseModel(ABC):
    def __init__(self, name: str, organization: str, tag: str = 'latest'):
        ...
        self.extra_files: list[ExtraFile] = []
        self.fetch_metadata()
```

Your existing `additional_safetensor_files`, `other_files`, and `safetensors_index_file` assignments in `fetch_metadata()` can be converted into `ExtraFile` instances and appended to `self.extra_files` instead of populating multiple attributes.

### 2. Extract `SnapshotFile` construction into a helper

`get_file_list` currently has multiple, almost-identical `SnapshotFile(...)` constructions. Pull that into a single helper that also standardizes checksum handling:

```python
def _snapshot_file_for(
    self,
    filename: str,
    file_type: SnapshotFileType,
    required: bool = True,
    should_show_progress: bool = True,
    should_verify_checksum: bool = True,
    use_custom_checksum: bool = True,
) -> SnapshotFile:
    checksum = ""
    if should_verify_checksum and use_custom_checksum:
        checksum = self._fetch_checksum_for_file(filename)
    elif should_verify_checksum and not use_custom_checksum:
        checksum = generate_sha256(filename)

    return SnapshotFile(
        url=f"{self.blob_url}/{filename}",
        header=self.headers,
        hash=f"sha256:{checksum}" if checksum else "",
        type=file_type,
        name=filename,
        required=required,
        should_show_progress=should_show_progress,
        should_verify_checksum=bool(checksum) and should_verify_checksum,
    )
```

Then `get_file_list` can be simplified and made more uniform:

```python
def get_file_list(self, cached_files: list[str]) -> list[SnapshotFile]:
    files: list[SnapshotFile] = []

    if self.model_filename not in cached_files:
        files.append(self.model_file())

    # existing mmproj / configs unchanged...

    for extra in self.extra_files:
        if extra.filename in cached_files:
            continue

        if extra.kind is ExtraFileKind.SAFETENSOR:
            file_type = SnapshotFileType.SafetensorModel
            use_custom_checksum = True
        else:
            file_type = SnapshotFileType.Other
            use_custom_checksum = extra.kind is not ExtraFileKind.INDEX

        files.append(
            self._snapshot_file_for(
                filename=extra.filename,
                file_type=file_type,
                required=extra.required,
                should_show_progress=extra.should_show_progress,
                should_verify_checksum=extra.should_verify_checksum,
                use_custom_checksum=use_custom_checksum,
            )
        )

    return files
```

This removes the separate loops for `additional_safetensor_files`, `other_files`, and the index file, and it keeps the logic declarative: `extra_files` describe *what* to download; `_snapshot_file_for` describes *how*.

### 3. Normalize checksum handling

Instead of:

- `_fetch_checksum_for_file` used only for some files, and
- `self.model_hash` being set directly in subclasses, and
- `generate_sha256(f".../{index_file}")` as a special case,

route everything through `_fetch_checksum_for_file` and a single code path.

```python
def _fetch_checksum_for_file(self, filename: str) -> str:
    """Return checksum (without 'sha256:' prefix) for a given filename.
    Base impl returns empty string (no checksum). Subclasses override.
    """
    return ""
```

Then update `model_file` to use the same hook:

```python
def model_file(self) -> SnapshotFile:
    assert self.model_filename

    file_type = (
        SnapshotFileType.SafetensorModel
        if fnmatch.fnmatch(self.model_filename, "*.safetensors")
        else SnapshotFileType.GGUFModel
    )

    checksum = self._fetch_checksum_for_file(self.model_filename)
    hash_value = f"sha256:{checksum}" if checksum else self.model_hash

    return SnapshotFile(
        url=f"{self.blob_url}/{self.model_filename}",
        header=self.headers,
        hash=hash_value,
        type=file_type,
        name=self.model_filename,
        should_show_progress=True,
        should_verify_checksum=bool(hash_value),
    )
```

And for index files, make the “no real checksum” explicit instead of generating a path-based hash:

```python
# when constructing ExtraFile for the index
self.extra_files.append(
    ExtraFile(
        filename=self.safetensors_index_file,
        kind=ExtraFileKind.INDEX,
        required=False,
        should_verify_checksum=False,
        should_show_progress=False,
    )
)
```

With the above changes:

- State is centralized in `extra_files` rather than three attributes.
- `get_file_list` is shorter and reads as “what files we want” instead of “how to build each snapshot.”
- All checksums go through `_fetch_checksum_for_file` or are explicitly marked as “no checksum,” eliminating special-case hashing for the index file.
</issue_to_address>

### Comment 4
<location> `ramalama/hf_style_repo_base.py:162-174` </location>
<code_context>
        if self.safetensors_index_file:
            if self.safetensors_index_file not in cached_files:
                logger.debug(f"Adding safetensors index file: {self.safetensors_index_file}")
                files.append(
                    SnapshotFile(
                        url=f"{self.blob_url}/{self.safetensors_index_file}",
                        header=self.headers,
                        hash=generate_sha256(f"{self.organization}/{self.name}/{self.safetensors_index_file}"),
                        type=SnapshotFileType.Other,
                        name=self.safetensors_index_file,
                        required=False,
                    )
                )

</code_context>

<issue_to_address>
**suggestion (code-quality):** Merge nested if conditions ([`merge-nested-ifs`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/merge-nested-ifs))

```suggestion
        if self.safetensors_index_file and self.safetensors_index_file not in cached_files:
            logger.debug(f"Adding safetensors index file: {self.safetensors_index_file}")
            files.append(
                SnapshotFile(
                    url=f"{self.blob_url}/{self.safetensors_index_file}",
                    header=self.headers,
                    hash=generate_sha256(f"{self.organization}/{self.name}/{self.safetensors_index_file}"),
                    type=SnapshotFileType.Other,
                    name=self.safetensors_index_file,
                    required=False,
                )
            )

```

<br/><details><summary>Explanation</summary>Too much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two `if` conditions can be combined using
`and` is an easy win.
</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 79 to 89
def fetch_repo_files(repo_name: str, revision: str = "main"):
"""Fetch the list of files in a HuggingFace repository using the Files API."""
token = huggingface_token()
api_url = f"https://huggingface.co/api/models/{repo_name}/tree/{revision}"
logger.debug(f"Fetching repo files from {api_url}")

request = urllib.request.Request(
url=api_url,
headers={
'Accept': 'application/json',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The HuggingFace Files API call does not handle pagination, which can truncate results for large repos.

This endpoint paginates results, so this implementation will only ever see the first page and may miss .safetensors or other files in larger models. Please either support pagination (e.g., following any cursor/next fields) or clearly document that this helper is only safe for small repos and enforce that assumption in callers.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great step forward in improving how models are downloaded from Hugging Face, by moving from the CLI to direct HTTPS requests via the API. This removes an external dependency and adds support for safetensors models, which is a significant enhancement.

My review focuses on a few areas to improve robustness and clarity. I've found a critical issue related to hash generation that could lead to cache corruption, and I've provided a detailed suggestion to fix it. I also have a couple of smaller suggestions regarding type hinting and file filtering logic to make the code more maintainable and prevent potential issues.

Overall, the changes are well-structured and address an important feature. With a few adjustments, this will be a solid contribution.

Comment on lines 144 to 165
if filename not in cached_files:
logger.debug(f"Adding other safetensors file: {filename}")
files.append(
SnapshotFile(
url=f"{self.blob_url}/{filename}",
header=self.headers,
hash=f"sha256:{self._fetch_checksum_for_file(filename)}",
type=SnapshotFileType.Other,
name=filename,
should_show_progress=False,
should_verify_checksum=False,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a critical bug here that can lead to file corruption in the cache. When _fetch_checksum_for_file(filename) returns an empty string (e.g., for JSON files), the hash becomes "sha256:". This causes all such files to collide and overwrite each other in the blob store.

Additionally, the log message Adding other safetensors file is misleading as these files aren't necessarily safetensors.

To fix this, you should generate a unique hash for files without an API-provided checksum (similar to config files) and correct the log message.

Suggested change
if filename not in cached_files:
logger.debug(f"Adding other safetensors file: {filename}")
files.append(
SnapshotFile(
url=f"{self.blob_url}/{filename}",
header=self.headers,
hash=f"sha256:{self._fetch_checksum_for_file(filename)}",
type=SnapshotFileType.Other,
name=filename,
should_show_progress=False,
should_verify_checksum=False,
)
)
if filename not in cached_files:
logger.debug(f"Adding other file: {filename}")
checksum = self._fetch_checksum_for_file(filename)
file_hash = f"sha256:{checksum}" if checksum else generate_sha256(f"{self.organization}/{self.name}/{filename}")
files.append(
SnapshotFile(
url=f"{self.blob_url}/{filename}",
header=self.headers,
hash=file_hash,
type=SnapshotFileType.Other,
name=filename,
should_show_progress=False,
should_verify_checksum=False,
)
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole checksum system needs to be updated. Files that are lfs blobs should use the blob sha signature. Other files should either hash the contents, or hash the oid, or use the oid directly (which would require relaxing the restriction that hash has to be 256 bits).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwieleRH Isn't the oid provided in the response when fetching the file list? Can't we use that one instead of re-fetching?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @jwieleRH

self.mmproj_hash = None
self.other_files: list[dict] = []
self.additional_safetensor_files: list[dict] = []
self.safetensors_index_file = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better type safety and readability, consider adding a type hint for safetensors_index_file. Based on its usage, str | None would be appropriate.

Suggested change
self.safetensors_index_file = None
self.safetensors_index_file: str | None = None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hints are being gradually applied everywhere.

Comment on lines 159 to 160
elif fnmatch.fnmatch(path, '*.txt'):
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to unconditionally skip all *.txt files seems a bit broad. While this might be intended to skip LICENSE or other text files, it could also unintentionally skip important files like requirements.txt which might be necessary for some models. Could you clarify the reasoning for this or perhaps make the filtering more specific to the files you intend to exclude?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be addressed by fixing the checksum system.

@rhatdan
Copy link
Member

rhatdan commented Dec 8, 2025

@olliewalsh @engelmi PTAL

Copy link
Member

@engelmi engelmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but overall looks good. Nice work!

path = file_info['path']
logger.debug(f"Examining file {path}")

if fnmatch.fnmatch(path, '*.safetensors'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple path.endswith(".safetenstors") seems sufficient and would remove the dependency on fnmatch (which I am not sure is usable on windows?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would rather use fnmatch or os.path.splitext, both of which are explicitly about path names and work fine on Windows, while endswith is about strings. As a practical matter, splitext may for all I know be implemented by an implicit endswith. So I don't feel strongly about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fnmatch isn't path-aware, though, and on Windows there are some differences such as case-insensitivity and the path separators \ instead of /.
Was talking with @telemaco about using pathlib.Path everywhere so we have the semantics of paths everywhere - currently its pretty mixed. In this case it would be a simple p.suffix.lower() == ".safetensors"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if fact you could use path.glob("*.safetensors") . The .glob() method is supported by Pathlib

Comment on lines 144 to 165
if filename not in cached_files:
logger.debug(f"Adding other safetensors file: {filename}")
files.append(
SnapshotFile(
url=f"{self.blob_url}/{filename}",
header=self.headers,
hash=f"sha256:{self._fetch_checksum_for_file(filename)}",
type=SnapshotFileType.Other,
name=filename,
should_show_progress=False,
should_verify_checksum=False,
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwieleRH Isn't the oid provided in the response when fetching the file list? Can't we use that one instead of re-fetching?

Comment on lines +193 to +201
# Determine file type based on extension
if fnmatch.fnmatch(self.model_filename, '*.safetensors'):
file_type = SnapshotFileType.SafetensorModel
else:
file_type = SnapshotFileType.GGUFModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I'd remove the else and just default to gguf:

Suggested change
# Determine file type based on extension
if fnmatch.fnmatch(self.model_filename, '*.safetensors'):
file_type = SnapshotFileType.SafetensorModel
else:
file_type = SnapshotFileType.GGUFModel
# Determine file type based on extension
file_type = SnapshotFileType.GGUFModel
if fnmatch.fnmatch(self.model_filename, '*.safetensors'):
file_type = SnapshotFileType.SafetensorModel

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a certain amount of sense because we do look for GGUF first.

@mikebonnet
Copy link
Collaborator

/ok-to-test

Refactor the fetch_metadata method to attempt to fetch GGUF metadata
from a manifest, and then to attempt to fetch safetensors metadata
from the repo files.

To fetch a safetensors model, download the set of files from the repo
with https requests. Add an "other_files" category for safetensors
files that are neither JSON config files nor .safetensors model files,
such as the tokenizer.model file.

Remove code to download models from huggingface.co with the
huggingface cli. It didn't work correctly anyway. Stub out the
_collect_cli_files and in_existing_cache methods for the huggingface
transport.

Enable bats test of downloading safetensors models from
huggingface. Remove bats test of downloading models with the
huggingface cli. Adjust test expectation for split safetensors model.

Fixes: containers#1493

Signed-off-by: John Wiele <jwiele@redhat.com>
@jwieleRH
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the Hugging Face transport to support downloading safetensors models via HTTPS, which is a great improvement. The fallback mechanism from GGUF manifest to safetensors file discovery is well-implemented. The removal of the broken huggingface-cli support cleans up the codebase. My main feedback is a suggestion to improve the consistency of handling the safetensors index file by not discarding its oid, which would align its processing with that of other repository files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skip huggingface-cli fallback for llama-server style huggingface uris

5 participants