Skip to content

An example & tutorial for learning how to get a local LLM running on the user device (no cloud or internet required) in a browser extension for various projects & tasks without the cost (performance, network, environmental, privacy, etc.) via the Google MediaPipe Library. (Perfect for general boilerplate as well.)

License

Notifications You must be signed in to change notification settings

RandomGamingDev/local-on-device-llm-browser-extension-example

Repository files navigation

Local, On Device LLM Browser Extension Example

An example with boilerplate for how to get LLMs in browser extensions (without the cloud) via Google's MediaPipe!

mediapipe-edge-ai




First Questions First: Why?

Running an edge LLM on the user's machine without the need for cloud's great for many reasons, least of which is no API cost.

☁️ Cloud/Traditional LLM ☁️ ⚡️ Edge LLM ⚡️         Winner        
Cost OpEx compound unpredictably, just one GPT-4 API response can cost 1.65 cents Free
including even fine-tuning due to model size
⚡️ Edge LLM ⚡️
Nothing better than free!
Performance High raw capacity allows for better generalized accuracy, but at the cost of being slow even without the CoT often requried Optimized Small Language Models (SLMs) leverage 149x higher throughput
Models like Microsoft Phi and methodologies like "Solving a Million-Step LLM Task with Zero Errors" proving SOTA accuracy at worst competitive with and at best completely exceeding ☁️ Cloud/Traditional LLM ☁️ for domain-specific tasks (which are most), even if worse in generalized tasks
⚡️ Edge LLM ⚡️
Latency Unavoidable & unstable network cost ~50–150ms even ignoring uncontrollable initialization & congestion latency Zero network overhead easily allowing for sub-10ms response times essential for real-time control systems and human interaction ⚡️ Edge LLM ⚡️
Network Dependency Requires high-bandwidth, continuous internet connection, risking buffering at best and complete service failure during outages at worst Guarantees 100% operational independence, ensuring continuous inference and local functionality, even when completely offline ⚡️ Edge LLM ⚡️
Customization Practically no flexibility as proprietary APIs restrict access to model weights, making domain-specific fine-tuning expensive or impossible Full, direct control over the model stack (GGUF, quantization), allowing deep customization for proprietary datasets and core business logic ⚡️ Edge LLM ⚡️
Community Vendor-reliant requiring provider-specific documentation and development, with often poor track records and walked-back decisions, restricting or blocking community involvement Thriving open-source ecosystems (e.g., Llama.cpp, KAITO) provide rapid innovation, broad toolchains, and peer-driven solutions larger & more accessible than proprietary solutions ⚡️ Edge LLM ⚡️
Privacy Requires trusting proven-untrustworthy companies and their 3rd parties as well as transmission over networks, potentially violating data sovereignty, compliance, and residency mandates All data's local and complete regulatory control (GDPR, HIPAA, etc.) ⚡️ Edge LLM ⚡️
Censorship Supplier-imposed guardrails & content-filtering, blocking even legitimate uses No or configurable guardrails, allowing fine-grained control ⚡️ Edge LLM ⚡️
Supplier flexibility Vendor lock-in due to API specificity and proprietary model dependency, resulting in high switching costs Open standards & portable, enabling seamless adoption of superior models ⚡️ Edge LLM ⚡️
Redundancy Centralized point of failure with extensive multi-region deployment strategies to mitigate single-vendor outages, and still countless failures On device means never fails even if the whole internet dies ⚡️ Edge LLM ⚡️
Environment Extreme cumulative energy & water consumption of massive data centers destroying environments, homes, communites, and our planet and still projected to reach petawatt-hour levels by 2026 globally Localized processing means energy costs magnitudes lower than the ☁️ Cloud/Traditional LLM ☁️ costs while enabling optimization (e.g. Energy Delay Product (EDP)) ⚡️ Edge LLM ⚡️

These cover nearly all of what creators & consumers want in AI while simply being the de facto moral choice whether it comes to monopolies, economy, privacy, environment, or creative expression.




How to Run the Demo:

  1. Install a WebGPU compatible browser and enable WebGPU in settings (you should also enable Vulkan if you have a browser without it enabled by default since without it it's quite unbearably slow)
  2. Place a MediaPipe compatible model file in resorces/models with pre-converted models like Google's Gemma & Microsoft's Phi being perfect places to start (you can convert basically whichever industry LLMs you want as long as they're not too large). For this demo, I used gemma-3n-E2B-it-int4-Web.litertlm since it's a powerful multimodal model which runs on even toasters, but for certain usecases, like if you want something even lighter, I'd recommend choosing something like gemma3-1b-it-int4-web.task (it runs lightning quick, much, much faster than cloud LLMs, and literally on anything, but is really dumb)! Then, change the DEFAULT_MODEL_NAME in src/index.js in order to make it use that model file.
  3. npm install
  4. npm run build
  5. Load the extension as a temporary one from about:debugging#/runtime/this-firefox for Firefox and chrome://extensions for Chrome (make sure to turn on Developer Mode in the top right)
  6. Click on the extension's icon in order to open a chat window where you can type into the top text field and then press the "Get Response" button to get the LLM's generated response
image




Basic Explanation:

Google's MediaPipe tutorial is a great place to start with getting an understanding of how MediaPipe works.
The components are commented for your use with the general structure being:

  1. The extension's popup serving as a frontend.
  2. An offscreen page to load the LLM so that it can be shared between multiple contexts and doesn't have to be reloaded every time the page is open.
  3. A backend service acting as a proxy between the popup and offscreen page as required and connected to both via ports.

About

An example & tutorial for learning how to get a local LLM running on the user device (no cloud or internet required) in a browser extension for various projects & tasks without the cost (performance, network, environmental, privacy, etc.) via the Google MediaPipe Library. (Perfect for general boilerplate as well.)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published