-
Notifications
You must be signed in to change notification settings - Fork 47
Local Dockerised Eval Server #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
2a8a461
e9a4af4
700d1a9
7ec8302
ce6b75f
7724f65
4cda0cd
78b020b
1ab0393
855fb9a
d588841
38b725c
c585f1d
ff99a7d
e8b37a3
3aaef1e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Evaluation Server Configuration | ||
# Copy this file to .env and configure your settings | ||
|
||
# Server Configuration | ||
PORT=8080 | ||
HOST=127.0.0.1 | ||
|
||
# LLM Provider API Keys | ||
# Configure one or more providers for evaluation | ||
|
||
# OpenAI Configuration | ||
OPENAI_API_KEY=sk-your-openai-api-key-here | ||
|
||
# LiteLLM Configuration (if using a LiteLLM server) | ||
LITELLM_ENDPOINT=http://localhost:4000 | ||
LITELLM_API_KEY=your-litellm-api-key-here | ||
|
||
# Groq Configuration | ||
GROQ_API_KEY=gsk_your-groq-api-key-here | ||
|
||
# OpenRouter Configuration | ||
OPENROUTER_API_KEY=sk-or-v1-your-openrouter-api-key-here | ||
|
||
# Default LLM Configuration for Evaluations | ||
# These will be used as fallbacks when not specified in evaluation requests | ||
DEFAULT_PROVIDER=openai | ||
DEFAULT_MAIN_MODEL=gpt-4 | ||
DEFAULT_MINI_MODEL=gpt-4-mini | ||
DEFAULT_NANO_MODEL=gpt-3.5-turbo | ||
|
||
# Logging Configuration | ||
LOG_LEVEL=info | ||
LOG_DIR=./logs | ||
|
||
# Client Configuration | ||
CLIENTS_DIR=./clients | ||
EVALS_DIR=./evals | ||
|
||
# RPC Configuration | ||
RPC_TIMEOUT=30000 | ||
|
||
# Security | ||
# Set this to enable authentication for client connections | ||
# Leave empty to disable authentication | ||
AUTH_SECRET_KEY= |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,16 @@ bo-eval-server is a WebSocket-based evaluation server for LLM agents that implem | |
- `OPENAI_API_KEY` - OpenAI API key for LLM judge functionality | ||
- `PORT` - WebSocket server port (default: 8080) | ||
|
||
### LLM Provider Configuration (Optional) | ||
- `GROQ_API_KEY` - Groq API key for Groq provider support | ||
- `OPENROUTER_API_KEY` - OpenRouter API key for OpenRouter provider support | ||
- `LITELLM_ENDPOINT` - LiteLLM server endpoint URL | ||
- `LITELLM_API_KEY` - LiteLLM API key for LiteLLM provider support | ||
- `DEFAULT_PROVIDER` - Default LLM provider (openai, groq, openrouter, litellm) | ||
- `DEFAULT_MAIN_MODEL` - Default main model name | ||
- `DEFAULT_MINI_MODEL` - Default mini model name | ||
- `DEFAULT_NANO_MODEL` - Default nano model name | ||
|
||
## Architecture | ||
|
||
### Core Components | ||
|
@@ -33,10 +43,11 @@ bo-eval-server is a WebSocket-based evaluation server for LLM agents that implem | |
- Handles bidirectional RPC communication | ||
|
||
**RPC Client** (`src/rpc-client.js`) | ||
- Implements JSON-RPC 2.0 protocol for server-to-client calls | ||
- Implements JSON-RPC 2.0 protocol for bidirectional communication | ||
- Manages request/response correlation with unique IDs | ||
- Handles timeouts and error conditions | ||
- Calls `Evaluate(request: String) -> String` method on connected agents | ||
- Supports `configure_llm` method for dynamic LLM provider configuration | ||
|
||
**LLM Evaluator** (`src/evaluator.js`) | ||
- Integrates with OpenAI API for LLM-as-a-judge functionality | ||
|
@@ -78,7 +89,10 @@ logs/ # Log files (created automatically) | |
### Key Features | ||
|
||
- **Bidirectional RPC**: Server can call methods on connected clients | ||
- **LLM-as-a-Judge**: Automated evaluation of agent responses using GPT-4 | ||
- **Multi-Provider LLM Support**: Support for OpenAI, Groq, OpenRouter, and LiteLLM providers | ||
- **Dynamic LLM Configuration**: Runtime configuration via `configure_llm` JSON-RPC method | ||
- **Per-Client Configuration**: Each connected client can have different LLM settings | ||
- **LLM-as-a-Judge**: Automated evaluation of agent responses using configurable LLM providers | ||
- **Concurrent Evaluations**: Support for multiple agents and parallel evaluations | ||
- **Structured Logging**: All interactions logged as JSON for analysis | ||
- **Interactive CLI**: Built-in CLI for testing and server management | ||
|
@@ -93,6 +107,79 @@ Agents must implement: | |
- `Evaluate(task: string) -> string` method | ||
- "ready" message to signal availability for evaluations | ||
|
||
### Model Configuration Schema | ||
|
||
The server uses a canonical nested model configuration format that allows per-tier provider and API key settings: | ||
|
||
#### Model Configuration Structure | ||
|
||
```typescript | ||
interface ModelTierConfig { | ||
provider: string; // "openai" | "groq" | "openrouter" | "litellm" | ||
model: string; // Model name (e.g., "gpt-4", "llama-3.1-8b-instant") | ||
api_key: string; // API key for this tier | ||
} | ||
|
||
interface ModelConfig { | ||
main_model: ModelTierConfig; // Primary model for complex tasks | ||
mini_model: ModelTierConfig; // Secondary model for simpler tasks | ||
nano_model: ModelTierConfig; // Tertiary model for basic tasks | ||
} | ||
``` | ||
|
||
#### Example: Evaluation with Model Configuration | ||
|
||
```json | ||
{ | ||
"jsonrpc": "2.0", | ||
"method": "evaluate", | ||
"params": { | ||
"tool": "chat", | ||
"input": {"message": "Hello"}, | ||
"model": { | ||
"main_model": { | ||
"provider": "openai", | ||
"model": "gpt-4", | ||
"api_key": "sk-main-key" | ||
}, | ||
"mini_model": { | ||
"provider": "openai", | ||
"model": "gpt-4-mini", | ||
"api_key": "sk-mini-key" | ||
}, | ||
"nano_model": { | ||
"provider": "groq", | ||
"model": "llama-3.1-8b-instant", | ||
"api_key": "gsk-nano-key" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
### Dynamic LLM Configuration | ||
|
||
The server supports runtime LLM configuration via the `configure_llm` JSON-RPC method: | ||
|
||
```json | ||
{ | ||
"jsonrpc": "2.0", | ||
"method": "configure_llm", | ||
"params": { | ||
"provider": "openai|groq|openrouter|litellm", | ||
"apiKey": "your-api-key", | ||
"endpoint": "endpoint-url-for-litellm", | ||
"models": { | ||
"main": "main-model-name", | ||
"mini": "mini-model-name", | ||
"nano": "nano-model-name" | ||
}, | ||
"partial": false | ||
}, | ||
"id": "config-request-id" | ||
} | ||
``` | ||
Comment on lines
+160
to
+181
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainClarify partial update semantics for Document which fields are merge‑patched when 🏁 Script executed: #!/bin/bash
rg -n -C2 --type=js --type=ts 'handleConfigureLLM|partial\s*[:=]\s*true|validateConfiguration' eval-server/nodejs Length of output: 811 🏁 Script executed: #!/bin/bash
sed -n '320,460p' eval-server/nodejs/src/lib/EvalServer.js Length of output: 3888 Fix partial-update semantics and document configure_llm behavior.
🤖 Prompt for AI Agents
|
||
|
||
### Configuration | ||
|
||
All configuration is managed through environment variables and `src/config.js`. Key settings: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Address dotenv-linter warnings (ordering + trailing newline).
Reorder keys and add final newline for cleanliness. Functionality unaffected.
Apply this diff:
Also applies to: 15-16, 27-29, 33-34, 45-45
🧰 Tools
🪛 dotenv-linter (3.3.0)
[warning] 6-6: [UnorderedKey] The HOST key should go before the PORT key
(UnorderedKey)
🤖 Prompt for AI Agents