Skip to content

[v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines)#1511

Open
nico-martin wants to merge 34 commits intomainfrom
v4-cache-handler
Open

[v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines)#1511
nico-martin wants to merge 34 commits intomainfrom
v4-cache-handler

Conversation

@nico-martin
Copy link
Collaborator

Improved Download Progress Tracking

Problem

Transformers.js couldn't reliably track total download progress because:

  • File lists weren't known before downloads started
  • File sizes were inconsistent (compressed vs uncompressed)
  • No cache awareness before initiating downloads

Solution

New Exported Functions

  • get_files(): Determines required files before downloading
  • get_model_files() / get_tokenizer_files() / get_processor_files(): Helper functions to identify files for each component
  • get_file_metadata(): Fetches file metadata using Range requests without downloading full content
    • Returns fromCache boolean to identify cached files
    • Ensures consistent uncompressed file sizes
  • is_cached(): Checks if all files from a model are already in cache

Enhanced Progress Tracking

  • readResponse() with expectedSize: Falls back to metadata when content-length header is missing
  • total_progress callback: Provides aggregate progress across all files

Review

One thing I am not super confident is the get_model_files function. I tried to test it with different model architectures, but maybe I missed some that load files that are not in that function. @xenova, could you smoke-test some models and write mie the models that fail?

Easiest way to do that is:

import {
  get_files,
  pipeline,
} from "@huggingface/transformers";

const expectedFiles = await get_files(
  "onnx-community/gemma-3-270m-it-ONNX",
  {
    dtype: "fp32",
    device: "webgpu",
  }
);
const loadedFiles = new Set();
const pipe = await pipeline(
  "text-generation",
  "onnx-community/gemma-3-270m-it-ONNX",
  {
    dtype: "fp32",
    device: "webgpu",
    progress_callback: (e) => {
      if (e.file) loadedFiles.add(e.file);
    },
  }
);

console.log(
  "SAME FILES:",
  expectedFiles.sort().join(",") === Array.from(loadedFiles).sort().join(",")
);

@nico-martin nico-martin requested a review from xenova February 3, 2026 15:24
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting PR! 🙌 Just a quick review from scanning the PR briefly.

});
/** @typedef {keyof typeof DATA_TYPES} DataType */

export const DEFAULT_DEVICE_DTYPE = DATA_TYPES.fp32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, we do a bit of a funny thing when loading models:

  1. if wasm, and no dtype set in config, we use q8 (8-bit) model as it's pretty fast on CPU
  2. if node cpu, and no dtype set in config, we use fp32 model
  3. if webgpu, and no dtype set in config, we use fp32 model

the main reason is that many models on WASM can encounter an out of memory issue if using fp32 on CPU in the browser. Something I think we can do is to do a scan of models on the hub and specify the default dtype there, especially on a per-device basis. We can check in a separate PR whether we have a good way to support per-device dtypes based on configs.

@xenova xenova changed the base branch from v4 to main February 13, 2026 17:03
@xenova xenova self-requested a review February 18, 2026 17:08
Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid progress! Thanks 🔥

@xenova xenova changed the title V4 cache handler [v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines) Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants