Skip to content

amareshhebbar/PocketLLM

Repository files navigation

PocketLLm

<PocketLLM framework="React Native + Expo" status="100% Offline" />

A 100% offline, privacy-first AI chat application built with React Native. PocketLLM downloads and runs powerful open-weight LLMs (like Google's Gemma 2B) directly on your device's CPU using Google's MediaPipe engine—no internet connection or cloud API required.


Check My MINI Docs

>>> Wanna start

Here is the straight-up way to get this running. You need to grab the model files yourself, throw them in an AWS S3 bucket, and link them to the app.

1. Download the Models Go to Kaggle or HuggingFace and download the raw model files you want to use. Here is the list of supported models and where to find them:

Model Name Size RAM Needed Where to get it Expected File Name
Gemma 1.1 2B (CPU) 1.3 GB 4 GB Kaggle Link gemma-1.1-2b-cpu.bin
Gemma 1.1 2B (GPU) 1.3 GB 4 GB Kaggle Link gemma-1.1-2b-gpu.bin
Gemma 4 E2B (Edge) 2.5 GB 6 GB HuggingFace Link gemma-4-e2b.task
Gemma 4 E4B (Edge) 5.2 GB 8 GB HuggingFace Link gemma-4-e4b.task
Llama 3.2 (1B Q4) 0.8 GB 3 GB HuggingFace Link llama-3.2-1b-q4.gguf
Llama 3.2 (3B Q4) 2.0 GB 4 GB HuggingFace Link llama-3.2-3b-q4.gguf
Phi-3.5 Mini (Q4) 2.4 GB 4 GB HuggingFace Link phi3-mini-q4.gguf
Qwen 2.5 (0.5B Q4) 0.4 GB 3 GB HuggingFace Link qwen-05b-q4.gguf
SmolLM2 (1.7B Q4) 1.0 GB 4 GB HuggingFace Link smollm-17b-q4.gguf

2. Push to AWS S3 You have to host these files somewhere the app can reach them.

  • Create a bucket in AWS S3.
  • Turn off "Block all public access".
  • Go to the Permissions tab and add this Bucket Policy to make it fully public for reading (change YOUR-BUCKET-NAME to your actual bucket):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicRead",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*"
        }
    ]
}

Want to swap out Gemma for another model? You can host your own .bin or .task files on an AWS S3 bucket and connect them to the app.

1. Get the Model Download a MediaPipe-compatible model (like gemma-1.1-2b-it-cpu-int4) from Kaggle or HuggingFace. Note: CPU models are recommended for maximum Android stability.

2. Host on S3 Upload the model to your AWS S3 bucket. Ensure you update your Bucket Policy to allow s3:GetObject for public read access.

3. Update the App Config Open app/components/config/models.ts and add your S3 URL:

export const AVAILABLE_MODELS: AIModel[] = [
  {
    id: 'my_custom_model',
    name: 'Gemma 2B (Custom CPU)',
    size: '1.3 GB',
    requiredRamGB: 4,
    url: 'https://YOUR_AWS_BUCKET_URL/gemma-1.1-2b-cpu.bin',
    filename: 'gemma-custom.bin', 
  }
];

>>>> How's it Working

PocketLLM wraps a highly complex C++ engine into a simple, two-screen experience:

1. Pick & Download (Screen 1) You start at the Model Manager. Browse the list of available models, pick the one you want to talk to, and download it straight to your phone's local storage.

2. Chat Offline (Screen 2) Once downloaded, tap into the chat. Here is what happens behind the scenes:

  • The Shield: The app rigorously checks your device's file system to ensure the massive .bin model is fully downloaded and safe to run.
  • The Engine: Google's MediaPipe C++ engine bridges with React Native to load the model directly into your phone's RAM.
  • The Stream: As you chat, the engine generates tokens one by one and streams them back to your screen instantly, so you don't have to wait 30 seconds for a response.

3. Switch it up Want to test a different model? Just hit the back button, download a new one from the list, and jump right back in.


See it in Action

Complete Walkthrough
Youtube Video Link


Live Chat Demo
YOutube Shorts Link


Model Listing

ScreenShort of the models list


How to run on your phone

Here is the step-by-step guide to get this native C++ app running on your actual device

1. Install Dependencies

npm i

2. The Dev Server (Skip this)

npx expo start

(Note: This starts the standard Expo server, but do not use this here. The offline AI uses a native C++ engine, which means it will NOT work on the Web or inside the standard Expo Go app.)

3. Run Directly on Your Phone (Recommended) To run the true native build on your physical device:

  1. Enable Developer Options and USB Debugging on your Android phone.
  2. Connect your phone to your computer via USB.
  3. Run this command:
npx expo run:android
  1. Allow the installation permissions on your phone screen, and watch the magic happen.

4. Build a Standalone APK If you want to generate an .apk file to install permanently or share with others:

npx expo prebuild

eas build -p android --profile preview

(Note: You will need to have the EAS CLI installed (npm install -g eas-cli) and be logged in to an Expo account to run the build command).


>>> Security

Offline by Design PocketLLM is built for absolute privacy.

  • Your prompts, questions, and chat history never leave your phone.
  • There are no API keys, no server logs, and no cloud processing.
  • The only time the app uses the internet is to perform the initial download of the AI model. Once the model is on your disk, you can turn on Airplane mode and chat forever.

Developer Q&A: AWS & App Safety

Q: Why didn't you provide your actual AWS bucket URL in the repo? A: Because leaving an AWS bucket public on GitHub is completely unsafe.

Q: Why not just hardcode the URL into the app before building the APK? A: Because hardcoding URLs into a frontend app is never secure. A basic reverse engineering of the APK file will extract that URL in seconds.

Q: What about hiding it in a .env file? A: That doesn't work for frontend mobile apps. Environment variables in React Native get compiled directly into the app bundle. Anyone with a decompiler can still find them easily.

Q: So how should this be handled for a real, production release? A: That is exactly why backend servers are still alive and necessary! For a true production app, the React Native frontend would talk to a backend server (like one built in Golang). The backend would securely hold the AWS credentials, verify the user, and send back a temporary, secure "signed URL" for the model download.


Updates needed
  • GPU Stabilization: Migrating away from CPU-only inference by optimizing OpenCL/Vulkan drivers for mid-range Android chips to prevent OS-level memory panics.
  • UI/UX Overhaul: Upgrading the chat interface with smoother animations, dynamic typing indicators, and full Markdown rendering (syntax-highlighted code blocks, tables, and bullet points).
  • Advanced State Management & Memory: Implementing SQLite to save persistent chat threads, and integrating open-source orchestration frameworks (like LangGraph) to manage complex, multi-turn conversation memory.
  • Agentic Tool Calling: Upgrading the LLM from a simple chatbot to a functional agent capable of triggering local device functions, controlling hallucinations via strict grounding prompts, and executing tool schemas.
  • Hybrid Web Search (Opt-in): Adding a toggleable online mode that allows the local model to securely fetch real-time data from the web when it needs context beyond its offline training data.
  • System-Wide Integration: Breaking the AI out of the app sandbox to interact with the rest of the phone (e.g., handling Android "Share" intents, reading the clipboard, or interacting with other local apps).


No clouds were harmed in the making of this app. ☁️🚫 Forged with React Native, Expo, and On-Device AI.

About

A fully offline AI chat app built with React Native. PocketLM runs lightweight models like Gemma 2B, Llama, and Qwen directly on your device, with models downloaded from AWS S3—so no internet is needed during use, and your data stays private on your phone

Topics

Resources

Stars

Watchers

Forks

Contributors