Skip to content

In browser models #42

@charlesLoder

Description

@charlesLoder

In an attempt to create a provider that leverages local models (i.e. models run in the browser), I've run in a few circles. Documenting here for future self.

Transformers.js

Ideally, Huggingface's Transformers.js library would meet my needs. but it falls short in the one area I need it to work.

My assumption is that I need to leverage the Image-text-to-text pipeline. The JS implementation of the library does not support Image-text-to-text. There is a PR to support it. Nor is there an AutoModel* for it.

web-llm

The web-llm project looks very promising, but it seems to have quite a few bugs.

For vision, I tried using the recommended Phi 3.5 model, but encountered this error.

It seems that Gemma 3 is not supported yet, as I get the same error as in this issue.

MediaPipe

MediaPipe has been the most successful endeavor so far.

They support multimodal prompting out of the box.

The biggest hangup is how the model is hosted.

Self serving the model file

They recommend downloading the model file and serving it yourself. The model is gated, and you have to be logged and granted access to download it.

That is just not reasonable for a plugin. There's no way I'm publishing a 4GB model file to npm.

User download

The provider could provide a way for users to download the file.

Huggingface's OAuth login requires a clientId if not signing in via a HF space. So I can't just ask users to sign in and then use the model.

Another option is to ask users to supply their own HF token, but that requires that they have a token and access to the model.


Ideally, if web-llm get fixed, that would be best.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions