A lightweight local LLM chat with a web UI and a C‑based server that runs any LLM chat executable as a child and communicates via pipes.
- General Information
- Technologies Used
- Features
- Screenshots
- Setup
- Usage
- Project Status
- Room for Improvement
- Acknowledgements
- Contact
- License
LLMux makes running a local LLM chat easier by providing Tailwind‑powered web UI + a minimal C server that simply spawns any compatible chat executable and talks to it over UNIX pipes. Everything runs on your machine — no third‑party services — so you retain full privacy and control. LLMux is good for:
- Privacy‑conscious users who want a self‑hosted, browser‑based chat interface.
- Developers who need to prototype a chat front‑end around a custom model without writing HTTP or JavaScript plumbing from scratch.
- llama.cpp — tag b5391
- CivetWeb — commit 85d361d85dd3992bf5aaa04a392bc58ce655ad9d
- Tailwind CSS — v3.4.16
- C++ 17 for the example chat executable
- GNU Make / Bash for build orchestration
- Browser‑based chat UI served by a tiny C HTTP server
- Pluggable LLM chat executable — just point at any compatible binary
- Configurable model name, context length, server port and max response length via #defineinserver.candllm.cpp
- Build script ( build.sh) to compile everything intoout/and runclang-formaton sources
- Obtain a model compatible with llama.cpp( e.g. a.gguffile ) and place it in themodels/directory.
- ( Optional ) If you don't use the example C++ chat app ( llm_chatakallm.cpp), update itsLLM_CHAT_EXECUTABLE_NAMEto match your chosen binary.
- Get llama.cpp and CivetWeb.
- Run:
./build.sh 
This will:
- Compile the C server and C++ example chat app
- Place all outputs under out/
- Format the source files with clang-format
- In out/, set theLLM_CHAT_EXECUTABLE_NAMEmacro inserver.cto your chat binary name and re‑build if needed.
- Start the server:
./out/server 
- Note the printed port number ( e.g. Server started on port 8080).
- Open your browser at http://localhost:<port>to start chatting.
Project is complete.All planned functionality — spawning the LLM, piping I/O, rendering a chat UI — is implemented.
To do:
- Dynamic response buffer: Switch from fixed buffers to dynamic allocation in server.c.
- Prompt unescape: Properly unescape JSON‑style sequences ( \",\\\, etc. ) in incoming prompts before forwarding.
- Inspired by the simple‑chat example in llama.cpp
Created by @lurkydismal - feel free to contact me!
This project is open source and available under the GNU Affero General Public License v3.0.

