A local HTTP server that powers the Moly app by providing capabilities for searching, downloading, and running local Large Language Models (LLMs). This server integrates with WasmEdge for model execution and provides an OpenAI-compatible API interface.
- Search and discover LLM models
- Download and manage model files
- Automatic mirror selection based on region
- Run local LLMs using WasmEdge runtime
- OpenAI-compatible API interface
-
Obtain the source code for this repository:
git clone https://github.com/moxin-org/moly-server.git
- Follow the platform-specific instructions below.
Install the required WasmEdge WASM runtime:
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- --version=0.14.1
source $HOME/.wasmedge/env
Then use cargo
to build and run the server:
cd moly-server
cargo run -p moly-server
Install the required WasmEdge WASM runtime:
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- --version=0.14.1
source $HOME/.wasmedge/env
Important
If your CPU does not support AVX512, then you should append the --noavx
option onto the above command.
To build Moly on Linux, you must install the following dependencies: openssl
,
clang
/libclang
, binfmt
. On a Debian-like Linux distro (e.g., Ubuntu), run
the following:
sudo apt-get update
sudo apt-get install libssl-dev pkg-config llvm clang libclang-dev binfmt-support
Then use cargo
to build and run the Moly server:
cd moly-server
cargo run -p moly-server
-
Install the required WasmEdge WASM runtime from the WasmEdge releases page:
WasmEdge-0.14.1-windows.msi
-
Download and extract the appropriate WASI-NN/GGML plugin for your system:
- For CUDA 11/12:
WasmEdge-plugin-wasi_nn-ggml-cuda-0.14.1-windows-x86_64.zip
- For CPUs with AVX512 support:
WasmEdge-plugin-wasi_nn-ggml-0.14.1-windows-x86_64.zip
- Otherwise:
WasmEdge-plugin-wasi_nn-ggml-noavx-0.14.1-windows-x86_64.zip
- For CUDA 11/12:
-
Copy the plugin DLL from that archive
.\lib\wasmedge\wasmedgePluginWasiNN.dll
toProgram Files\WasmEdge\lib\
-
Then use
cargo
to build and run the Moly server:cd moly-server cargo run -p moly-server
To run the server locally:
cargo run -p moly-server
The server will start on the configured port (default: 8765) and log its address.
The server can be configured using the following environment variables:
MOLY_SERVER_PORT
: Port number for the HTTP server (default: 8765)MODEL_CARDS_REPO
: Custom repository URL for model cardsMOLY_API_SERVER_ADDR
: Custom address for the API server (default: localhost:0)
GET /files
- List all downloaded filesDELETE /files/{id}
- Delete a specific file
GET /downloads
- List all current downloadsPOST /downloads
- Start a new downloadGET /downloads/{id}/progress
- Get download progressPOST /downloads/{id}
- Pause a downloadDELETE /downloads/{id}
- Cancel a download
POST /models/load
- Load a modelPOST /models/eject
- Eject the currently loaded modelGET /models/featured
- Get featured modelsGET /models/search
- Search for modelsPOST /models/v1/chat/completions
- Chat completions endpoint (OpenAI-compatible)