by Virtue_hearts (Darknet.ca Labs)
HiveMind is a local-first, edge-augmented RAG protocol that treats memory as portable, hot-swappable artifacts called EMUs (Encapsulated Memory Units) β instead of giant monolithic vector databases.
This repository is also the framework for HiveMind LLM 3.0 β a next-generation model designed to compete with xAI and OpenAI and push toward AGI. Explore the roadmap in Future_HiveMind_LLM.md.
- Install prerequisites
- Node.js 20+ (includes
npm) - Ollama for the local router model
- Node.js 20+ (includes
- Clone the repo
git clone https://github.com/virtuehearts/HiveMind.git cd HiveMind - Install dependencies (backend + Vite frontend)
npm install
- Download the router model (8GB-friendly)
curl -fsSL https://ollama.com/install.sh | sh # Smallest default: 1.5B parameters ollama pull qwen2.5:1.5b-instruct
- Start the backend (API on http://localhost:4000)
npm run dev:server
- Start the web UI in a second terminal (http://localhost:5173)
npm run dev:web
- Verify connectivity
- Open the web UI and confirm the readiness cards for backend URL, router model, and EMU mounts show green.
- OpenRouter β set
OPENROUTER_API_KEYin your environment (or.env) before running enrichment jobs so the backend can stream responses. You can override the default model withOPENROUTER_MODEL. - Local embeddings β the EMU builder uses
Xenova/all-MiniLM-L6-v2. The first run downloads the model to your transformers cache; ensure the machine can reach the model hub or pre-seed the cache (e.g., viaTRANSFORMERS_CACHE).
Use the bundled HiveMind shell script in the repo root to manage the full stack (Ollama, backend, and web UI). It stores PID files and logs under .hivemind/ so you can start/stop services cleanly.
# Install Ollama if needed, pull qwen2.5:1.5b-instruct, and start everything
./HiveMind install
# Start all services (assumes dependencies are already installed)
./HiveMind start
# Check process status (ollama / server / web)
./HiveMind status
# Stop everything
./HiveMind stopLogs live in .hivemind/logs/ for each component (Ollama, server, and web). Use ./HiveMind help to see all available commands.
- Install Ollama locally and pull the lightweight router model:
ollama pull qwen2.5:1.5b-instruct. - Install workspace dependencies:
npm install(this sets up both the backend and the Vite frontend). - Run the backend:
npm run dev:server(default: http://localhost:4000). - In a new terminal, run the frontend:
npm run dev:web(default: http://localhost:5173).
The frontend uses the backend router endpoints (/api/route and /api/chat) to exercise the local Qwen2.5 1.5B router before the rest of the RAG stack is added. The compact default keeps RAM use low enough for 8GB laptops while still enabling routing and chat.
It is designed to run on: β Consumer CPUs (8β16GB RAM) β Qwen2.5 1.5B stays under ~4GB RAM β NVIDIA RTX GPUs (6GB VRAM) while delivering 40β50 tokens/sec using quantized SLMs.
- The Chat page shows readiness cards for the backend URL, router model, mounted EMUs, and the three startup steps above.
- Use slash commands directly from the input box:
/emusto list,/mount <emu-id>,/unmount <emu-id>, and/reset. - Add folders ending in
.emuunderemus/, refresh, then mount and ask questions to see retrieved context in the preview panel. - Settings allow overriding the API base if your backend is not on
http://localhost:4000.
HiveMind is the anti-enterprise RAG:
no lock-in, no cloud dependency, no surveillance, no massive vector silos.
Current enterprise RAG systems are fundamentally flawed:
β Privacy Risk β They transmit entire context windows (including PII) to cloud LLMs
β Latency β Remote vector DB round-trips slow the entire pipeline
β Cost β Tokens wasted on irrelevant noise
β Vendor Lock-In β Memory trapped inside proprietary cloud systems
β Monolithic Databases β Giant, static vector stores nobody can fork or share
Local memory. Cloud inference. Zero noise. Maximum privacy.
Your machine becomes the router, filter, and guardian at the gate.
Encapsulated Memory Units are portable, Git-friendly knowledge capsules:
my-dataset.emu/
βββ vectors.lance # LanceDB file-based embeddings
βββ metadata.json # Tags, attribution, version info
βββ config.yaml # Embedding model + retriever settings
- π© Portable β Share via Git, IPFS, email, S3, or attachments
- π© Sharable β Share via hivemind / torrent protocol
- π© Hot-Swappable β Mount/unmount instantly based on query intent
- π© Local-First β Stored on disk, not a cloud DB
- π© Version-Controlled β Branch, diff, roll back
- π© Composable β Mix and match EMUs like software packages
Knowledge becomes modular.
Knowledge becomes a file.
Knowledge becomes yours.
Runs entirely on CPU/GPU locally
Models: Qwen 2.5 (1.5Bβ3B) / Phi-3.5
Tasks:
- Intent Classification
- Query Transformation
- Re-Ranking
- PII Redaction
- EMU Selection
- LanceDB (serverless, file-based)
- Embeddings: all-MiniLM-L6-v2 (quantized)
- Memory = local disk, not a remote DB
Gemini / Claude / GPT / OpenRouter
- Pure inference
- No persistent state
- Lowest possible context due to local pre-filtering
because only relevant, cleaned, graded chunks make it upstream.
User Input
β
intent_router (Local SLM)
β (Context Needed)
retriever (LanceDB Hybrid Search)
β
grader (Local SLM, PII Filter, Relevancy Scoring)
β
synthesizer (Cloud LLM)
β
Client Output
A stategraph with conditional edges ensures deterministic routing and fine-grained agent control.
Before a cloud LLM sees anything, HiveMind:
β Runs intent classification locally
β Filters irrelevant retrievals
β Removes PII
β Compresses + rewrites chunks into minimal gold context
Cloud LLM only receives clean, tiny, relevant context.
Mount/unmount knowledge in real time:
hivemind mount poetry.emu
hivemind mount python-docs.emu
hivemind unmount legal-v1.emu
No monolithic DB.
No global vector mess.
Zero noise.
- Quantized Qwen/Phi models
- LanceDB file-backed retrieval
- No big corperations having your memories / datasets
- No need for 24gb+ GPU's or Professional hardware.
- Can run on a Dell OptiPlex, ThinkPad, or old gaming PC
| Layer | Technology | Role |
|---|---|---|
| Workflow Engine | LangGraph | Agentic DAG pipeline |
| Local Inference | Ollama / vLLM | SLM execution |
| Vector Store | LanceDB | Serverless file-based memory |
| Router SLM | Qwen 2.5 / Phi-3.5 | Intent classification + routing |
| Cloud LLM | Gemini 3.0 / Claude / GPT | Final synthesis |
| Frontend | Web Console / API | Integration layer |
metadata:
name: "Classic English Poetry"
version: "v1.2"
creator: "John Doe"
timestamp: "2025-11-23T14:00:00Z"
embeddings:
model: "all-MiniLM-L6-v2"
dimension: 384
retriever_settings:
k_neighbors: 5
max_score_threshold: 0.82EMUs are zipped bundles that run locally, privately, offline.
| Status | Value |
|---|---|
| CPU/GPU Target | Consumer CPU or NVIDIA RTX (6GB) |
| Throughput | 40β50 tokens/sec (quantized SLM) |
| Architecture | Local-First / Edge-Augmented |
| Core Feature | EMU Capsules |
β
EMU file format
β
Python EMU mount/unmount
β
HiveMind Console
β
LangGraph integration
β¬ Public EMU Browser
β¬ EMU Registry
β¬ IPFS Distribution
β¬ Torrent-based Swarms
β¬ Community Knowledge Marketplace
β¬ Auto-build EMUs using Gemini
β¬ Domain-specific EMU builders
β¬ Self-healing βTeach HiveMindβ loops
HiveMind is building the worldβs first fully local-first Agentic RAG protocol:
- Optimized for RTX 6GB GPUs and low-budget workstations
- 40β50 TPS SLM pipelines
- Portable, modular memory containers
- Cloud only for final reasoning
- Privacy built in by default
Your data stays yours.
Your memory stays local.
Your agents become sovereign.
Created by Warren Kreklo
Darknet.ca Labs (Est. 2003)
π§ admin@darknet.ca
π¦ @virtue_hearts
