NillionNetwork · jcabrero · Mar 5, 2025 · Mar 5, 2025 · Mar 5, 2025 · Mar 5, 2025
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Nillion
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -1,68 +1,134 @@
 # nilAI
 
-Copy the `.env.sample` to `.env` and replace the value of the `HUGGINGFACE_API_TOKEN` with the appropriate value. The `HUGGINGFACE_API_TOKEN` is used to determine whether a user has permission to access certain models. For example, for Llama models, you usually need to have requested access to the model on its [Hugging Face page](https://huggingface.co/meta-llama/Llama-3.2-1B).
+## Overview
+nilAI is a platform designed to run on Confidential VMs with Trusted Execution Environments (TEEs). It ensures secure deployment and management of multiple AI models across different environments, providing a unified API interface for accessing various AI models with robust user management and model lifecycle handling.
 
-There are two ways to deploy **nilAI**. The recommended way is to use docker-compose as it is the easiest and most straightforward.
+## Prerequisites
 
-## Docker
+- Docker
+- Docker Compose
+- Hugging Face API Token (for accessing certain models)
 
-For development environments with
+## Configuration
+
+1. **Environment Setup**
+   - Copy the `.env.sample` file to `.env`
+   - Replace `HUGGINGFACE_API_TOKEN` with your Hugging Face API token
+   - Obtain token by requesting access on the specific model's [Hugging Face page](https://huggingface.co/meta-llama/Llama-3.2-1B)
+
+## Deployment Options
+
+### 1. Docker Compose Deployment (Recommended)
+
+#### Development Environment
 ```shell
 docker compose -f docker-compose.yml \
--f docker-compose.dev.yml \
--f docker/compose/docker-compose.llama-3b-gpu.yml \
--f docker/compose/docker-compose.llama-8b-gpu.yml \
-up --build
+  -f docker-compose.dev.yml \
+  -f docker/compose/docker-compose.llama-3b-gpu.yml \
+  -f docker/compose/docker-compose.llama-8b-gpu.yml \
+  -f docker/compose/docker-compose.dolphin-8b-gpu.yml \
+  -f docker/compose/docker-compose.deepseek-14b-gpu.yml \
+  up --build
 ```
 
-For production environments:
+#### Production Environment
 ```shell
 docker compose -f docker-compose.yml \
--f docker-compose.prod.yml \
--f docker/compose/docker-compose.llama-3b-gpu.yml \
--f docker/compose/docker-compose.llama-8b-gpu.yml \
-up --build
+  -f docker-compose.prod.yml \
+  -f docker/compose/docker-compose.llama-3b-gpu.yml \
+  -f docker/compose/docker-compose.llama-8b-gpu.yml \
+  -f docker/compose/docker-compose.dolphin-8b-gpu.yml \
+  -f docker/compose/docker-compose.deepseek-14b-gpu.yml \
+  up -d --build
 ```
 
-## Manual Deployment
+**Note**: Remove lines for models you do not wish to deploy.
 
-**nilAI** consists of the following components:
- - **API Frontend**: Receives user requests and handles them appropriately. For model requests, it forwards them to the appropriate backend model.
- - **Two Databases**:
-    - **SQLite**: The main registry of users on the platform. This will be changed as we move to more production-ready environments. It tracks which users are allowed on the platform, their API keys, and their usage.
-    - **etcd3**: A distributed key-value database used in Kubernetes. It creates key-value pairs with a lifetime. When a key-value pair's lifetime expires, it is automatically removed. Models register their address information on the etcd3 database with a lifetime and keep this lifetime alive. If a model ever disconnects due to an error, the database removes the entry, and the API Frontend no longer advertises that model.
- - **Models**: There may be zero or more model deployments. Model deployments contain a basic API that responds in the same format to the `/v1/chat/completions` endpoint. The `Model` class defines how models connect to the database and manage their lifecycle.
+### 2. Manual Deployment
 
-To deploy the components, first create the `etcd3` instance. The easiest way is to expose it with Docker:
+#### Components
 
-```shell
-# This command runs in the background. If it fails, you may already be running etcd-server on ports 2379 and 2380.
-docker run -d --name etcd-server -p 2379:2379 -p 2380:2380 -e ALLOW_NONE_AUTHENTICATION=yes bitnami/etcd:latest
-```
+- **API Frontend**: Handles user requests and routes model interactions
+- **Databases**:
+  - **SQLite**: User registry and access management
+  - **etcd3**: Distributed key-value store for model lifecycle management
 
-Run the **nilAI** API server:
-```shell
-# Shell 1
-## For development environment (auto reloads on file changes):
-uv run fastapi dev nilai-api/src/nilai_api/__main__.py --port 8080
-## For production environment:
-uv run fastapi run nilai-api/src/nilai_api/__main__.py --port 8080
-```
+#### Setup Steps
 
-Run the **nilAI** Llama 3.2 1B model. For different models, adapt the command below:
-```shell
-# Shell 2
-## For development environment (auto reloads on file changes):
-uv run fastapi dev nilai-models/src/nilai_models/models/llama_1b_cpu/__init__.py
-## For production environment:
-uv run fastapi run nilai-models/src/nilai_models/models/llama_1b_cpu/__init__.py
-```
+1. **Start etcd3 Instance**
+   ```shell
+   docker run -d --name etcd-server \
+     -p 2379:2379 -p 2380:2380 \
+     -e ALLOW_NONE_AUTHENTICATION=yes \
+     bitnami/etcd:latest
+
+   docker run -d --name redis \
+     -p 6379:6379 \
+     redis:latest
+
+   docker run -d --name postgres \
+     -e POSTGRES_USER=user \
+     -e POSTGRES_PASSWORD=<ASECUREPASSWORD> \
+     -e POSTGRES_DB=yourdb \
+     -p 5432:5432 \
+     postgres:latest
+   ```
+
+2. **Run API Server**
+   ```shell
+   # Development Environment
+   uv run fastapi dev nilai-api/src/nilai_api/__main__.py --port 8080
+
+   # Production Environment
+   uv run fastapi run nilai-api/src/nilai_api/__main__.py --port 8080
+   ```
 
-## Developer Instructions
+3. **Run Model Instances**
+   ```shell
+   # Example: Llama 3.2 1B Model
+   # Development Environment
+   uv run fastapi dev nilai-models/src/nilai_models/models/llama_1b_cpu/__init__.py
 
-If you are developping, you can use `pre-commit` configurations to ensure make the development smoother and not having to wait for CI checks. These are executed before you commit, and perform automatic changes to format your code.
+   # Production Environment
+   uv run fastapi run nilai-models/src/nilai_models/models/llama_1b_cpu/__init__.py
+   ```
+
+## Developer Workflow
+
+### Code Quality and Formatting
+
+Install pre-commit hooks to automatically format code and run checks:
 
-You can install those with:
 ```shell
 uv run pre-commit install
 ```
+
+## Model Lifecycle Management
+
+- Models register themselves in the etcd3 database
+- Registration includes address information with an auto-expiring lifetime
+- If a model disconnects, it is automatically removed from the available models
+
+## Security
+
+- Hugging Face API token controls model access
+- SQLite database manages user permissions
+- Distributed architecture allows for flexible security configurations
+
+## Troubleshooting
+
+- Ensure Hugging Face API token is valid
+- Check etcd3 and Docker container logs for connection issues
+- Verify network ports are not blocked or in use
+
+## Contributing
+
+1. Fork the repository
+2. Create a feature branch
+3. Install pre-commit hooks
+4. Make your changes
+5. Submit a pull request
+
+## License
+
+[Add your project's license information here]
diff --git a/docker/compose/docker-compose.deepseek-14b-gpu.yml b/docker/compose/docker-compose.deepseek-14b-gpu.yml
@@ -20,12 +20,14 @@ services:
     depends_on:
       etcd:
         condition: service_healthy
+      llama_8b_gpu:
+        condition: service_healthy
     command: >
       --model deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
-      --gpu-memory-utilization 0.4
+      --gpu-memory-utilization 0.39
       --max-model-len 10000
       --tensor-parallel-size 1
-      --uvicorn-log-level WARNING
+      --uvicorn-log-level warning
     environment:
       - SVC_HOST=deepseek_14b_gpu
       - SVC_PORT=8000

diff --git a/docker/compose/docker-compose.dolphin-8b-gpu.yml b/docker/compose/docker-compose.dolphin-8b-gpu.yml
@@ -20,14 +20,16 @@ services:
     depends_on:
       etcd:
         condition: service_healthy
+      llama_3b_gpu:
+        condition: service_healthy
     command: >
       --model cognitivecomputations/Dolphin3.0-Llama3.1-8B
-      --gpu-memory-utilization 0.5
+      --gpu-memory-utilization 0.21
       --max-model-len 10000
       --tensor-parallel-size 1
       --enable-auto-tool-choice
       --tool-call-parser llama3_json
-      --uvicorn-log-level WARNING
+      --uvicorn-log-level warning
     environment:
       - SVC_HOST=dolphin_8b_gpu
       - SVC_PORT=8000

diff --git a/docker/compose/docker-compose.llama-3b-gpu.yml b/docker/compose/docker-compose.llama-3b-gpu.yml
@@ -20,14 +20,16 @@ services:
     depends_on:
       etcd:
         condition: service_healthy
+      deepseek_14b_gpu:
+        condition: service_healthy
     command: >
       --model meta-llama/Llama-3.2-3B-Instruct
-      --gpu-memory-utilization 0.3
-      --max-model-len 10000
+      --gpu-memory-utilization 0.085
+      --max-model-len 4300
       --tensor-parallel-size 1
       --enable-auto-tool-choice
       --tool-call-parser llama3_json
-      --uvicorn-log-level WARNING
+      --uvicorn-log-level warning
     environment:
       - SVC_HOST=llama_3b_gpu
       - SVC_PORT=8000

diff --git a/docker/compose/docker-compose.llama-8b-gpu.yml b/docker/compose/docker-compose.llama-8b-gpu.yml
@@ -22,12 +22,12 @@ services:
         condition: service_healthy
     command: >
       --model meta-llama/Llama-3.1-8B-Instruct
-      --gpu-memory-utilization 0.5
+      --gpu-memory-utilization 0.21
       --max-model-len 10000
       --tensor-parallel-size 1
       --enable-auto-tool-choice
       --tool-call-parser llama3_json
-      --uvicorn-log-level WARNING
+      --uvicorn-log-level warning
     environment:
       - SVC_HOST=llama_8b_gpu
       - SVC_PORT=8000

diff --git a/nilai-api/gunicorn.conf.py b/nilai-api/gunicorn.conf.py
@@ -4,7 +4,7 @@
 bind = ["0.0.0.0:8080", "0.0.0.0:8443"]
 
 # Set the number of workers (2)
-workers = 10
+workers = 50
 
 # Set the number of threads per worker (16)
 threads = 1

diff --git a/nilai-api/src/nilai_api/config/mainnet.py b/nilai-api/src/nilai_api/config/mainnet.py
@@ -5,8 +5,9 @@
 # there can be 45 + 30 + 15 + 5 = 85 concurrent requests in the system
 MODEL_CONCURRENT_RATE_LIMIT = {
     "meta-llama/Llama-3.2-1B-Instruct": 45,
-    "meta-llama/Llama-3.2-3B-Instruct": 30,
-    "meta-llama/Llama-3.1-8B-Instruct": 15,
+    "meta-llama/Llama-3.2-3B-Instruct": 50,
+    "meta-llama/Llama-3.1-8B-Instruct": 30,
+    "cognitivecomputations/Dolphin3.0-Llama3.1-8B": 30,
     "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B": 5,
 }
 

diff --git a/nilai-api/src/nilai_api/config/testnet.py b/nilai-api/src/nilai_api/config/testnet.py
@@ -7,6 +7,7 @@
     "meta-llama/Llama-3.2-1B-Instruct": 10,
     "meta-llama/Llama-3.2-3B-Instruct": 10,
     "meta-llama/Llama-3.1-8B-Instruct": 5,
+    "cognitivecomputations/Dolphin3.0-Llama3.1-8B": 5,
     "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B": 5,
 }