In browser models

In an attempt to create a provider that leverages local models (i.e. models run in the browser), I've run in a few circles. Documenting here for future self.

### Transformers.js

Ideally, Huggingface's Transformers.js library would meet my needs. but it falls short in the one area I need it to work.

My assumption is that I need to leverage the [`Image-text-to-text`](https://huggingface.co/docs/transformers/en/tasks/image_text_to_text) pipeline. The JS implementation of the library does not support `Image-text-to-text`. There is a [PR](https://github.com/huggingface/transformers.js/pull/1347) to support it. Nor is there an [`AutoModel*`](https://huggingface.co/docs/transformers.js/api/models) for it.



### web-llm

The [web-llm](https://github.com/mlc-ai/web-llm?tab=readme-ov-file) project looks very promising, but it seems to have quite a few bugs.

For vision, I tried using the recommended Phi 3.5 model, but encountered this [error](https://github.com/mlc-ai/web-llm/issues/657).

It seems that Gemma 3 is not supported yet, as I get the same error as in [this issue](https://github.com/mlc-ai/web-llm/issues/681).


### MediaPipe

MediaPipe has been the most successful endeavor so far.

They support [multimodal prompting](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js#multimodal) out of the box.

The biggest hangup is how the model is hosted.

#### Self serving the model file

They recommend downloading the model file and serving it yourself. The model is gated, and you have to be logged and granted access to download it.

That is just not reasonable for a plugin. There's no way I'm publishing a 4GB model file to npm.

#### User download

The provider could provide a way for users to download the file.

Huggingface's OAuth login requires a `clientId` if not signing in via a HF space. So I can't just ask users to sign in and then use the model.

Another option is to ask users to supply their own HF token, but that requires that they have a token and access to the model.


---

Ideally, if web-llm get fixed, that would be best.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

In browser models #42

Transformers.js

web-llm

MediaPipe

Self serving the model file

User download

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

In browser models #42

Description

Transformers.js

web-llm

MediaPipe

Self serving the model file

User download

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions