Skip to content
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
450 changes: 450 additions & 0 deletions MODEL-CONFIGS.md

Large diffs are not rendered by default.

45 changes: 45 additions & 0 deletions eval-server/nodejs/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Evaluation Server Configuration
# Copy this file to .env and configure your settings

# Server Configuration
PORT=8080
HOST=127.0.0.1
Comment on lines +5 to +6
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Address dotenv-linter warnings (ordering + trailing newline).

Reorder keys and add final newline for cleanliness. Functionality unaffected.

Apply this diff:

-PORT=8080
-HOST=127.0.0.1
+HOST=127.0.0.1
+PORT=8080
@@
-LITELLM_ENDPOINT=http://localhost:4000
-LITELLM_API_KEY=your-litellm-api-key-here
+LITELLM_API_KEY=your-litellm-api-key-here
+LITELLM_ENDPOINT=http://localhost:4000
@@
-DEFAULT_PROVIDER=openai
-DEFAULT_MAIN_MODEL=gpt-4
-DEFAULT_MINI_MODEL=gpt-4-mini
-DEFAULT_NANO_MODEL=gpt-3.5-turbo
+DEFAULT_MAIN_MODEL=gpt-4
+DEFAULT_MINI_MODEL=gpt-4-mini
+DEFAULT_NANO_MODEL=gpt-3.5-turbo
+DEFAULT_PROVIDER=openai
@@
-LOG_LEVEL=info
-LOG_DIR=./logs
+LOG_DIR=./logs
+LOG_LEVEL=info
@@
-AUTH_SECRET_KEY=
+AUTH_SECRET_KEY=
+

Also applies to: 15-16, 27-29, 33-34, 45-45

🧰 Tools
🪛 dotenv-linter (3.3.0)

[warning] 6-6: [UnorderedKey] The HOST key should go before the PORT key

(UnorderedKey)

🤖 Prompt for AI Agents
In eval-server/nodejs/.env.example around lines 5-6 (and similarly at 15-16,
27-29, 33-34, 45-45), the .env entries are out of the linter's expected ordering
and the file lacks a trailing newline; reorder the environment keys into the
expected (alphabetical) order within each block/section and ensure the file ends
with a single final newline character so dotenv-linter warnings are resolved.


# LLM Provider API Keys
# Configure one or more providers for evaluation

# OpenAI Configuration
OPENAI_API_KEY=sk-your-openai-api-key-here

# LiteLLM Configuration (if using a LiteLLM server)
LITELLM_ENDPOINT=http://localhost:4000
LITELLM_API_KEY=your-litellm-api-key-here

# Groq Configuration
GROQ_API_KEY=gsk_your-groq-api-key-here

# OpenRouter Configuration
OPENROUTER_API_KEY=sk-or-v1-your-openrouter-api-key-here

# Default LLM Configuration for Evaluations
# These will be used as fallbacks when not specified in evaluation requests
DEFAULT_PROVIDER=openai
DEFAULT_MAIN_MODEL=gpt-4
DEFAULT_MINI_MODEL=gpt-4-mini
DEFAULT_NANO_MODEL=gpt-3.5-turbo

# Logging Configuration
LOG_LEVEL=info
LOG_DIR=./logs

# Client Configuration
CLIENTS_DIR=./clients
EVALS_DIR=./evals

# RPC Configuration
RPC_TIMEOUT=30000

# Security
# Set this to enable authentication for client connections
# Leave empty to disable authentication
AUTH_SECRET_KEY=
91 changes: 89 additions & 2 deletions eval-server/nodejs/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,16 @@ bo-eval-server is a WebSocket-based evaluation server for LLM agents that implem
- `OPENAI_API_KEY` - OpenAI API key for LLM judge functionality
- `PORT` - WebSocket server port (default: 8080)

### LLM Provider Configuration (Optional)
- `GROQ_API_KEY` - Groq API key for Groq provider support
- `OPENROUTER_API_KEY` - OpenRouter API key for OpenRouter provider support
- `LITELLM_ENDPOINT` - LiteLLM server endpoint URL
- `LITELLM_API_KEY` - LiteLLM API key for LiteLLM provider support
- `DEFAULT_PROVIDER` - Default LLM provider (openai, groq, openrouter, litellm)
- `DEFAULT_MAIN_MODEL` - Default main model name
- `DEFAULT_MINI_MODEL` - Default mini model name
- `DEFAULT_NANO_MODEL` - Default nano model name

## Architecture

### Core Components
Expand All @@ -33,10 +43,11 @@ bo-eval-server is a WebSocket-based evaluation server for LLM agents that implem
- Handles bidirectional RPC communication

**RPC Client** (`src/rpc-client.js`)
- Implements JSON-RPC 2.0 protocol for server-to-client calls
- Implements JSON-RPC 2.0 protocol for bidirectional communication
- Manages request/response correlation with unique IDs
- Handles timeouts and error conditions
- Calls `Evaluate(request: String) -> String` method on connected agents
- Supports `configure_llm` method for dynamic LLM provider configuration

**LLM Evaluator** (`src/evaluator.js`)
- Integrates with OpenAI API for LLM-as-a-judge functionality
Expand Down Expand Up @@ -78,7 +89,10 @@ logs/ # Log files (created automatically)
### Key Features

- **Bidirectional RPC**: Server can call methods on connected clients
- **LLM-as-a-Judge**: Automated evaluation of agent responses using GPT-4
- **Multi-Provider LLM Support**: Support for OpenAI, Groq, OpenRouter, and LiteLLM providers
- **Dynamic LLM Configuration**: Runtime configuration via `configure_llm` JSON-RPC method
- **Per-Client Configuration**: Each connected client can have different LLM settings
- **LLM-as-a-Judge**: Automated evaluation of agent responses using configurable LLM providers
- **Concurrent Evaluations**: Support for multiple agents and parallel evaluations
- **Structured Logging**: All interactions logged as JSON for analysis
- **Interactive CLI**: Built-in CLI for testing and server management
Expand All @@ -93,6 +107,79 @@ Agents must implement:
- `Evaluate(task: string) -> string` method
- "ready" message to signal availability for evaluations

### Model Configuration Schema

The server uses a canonical nested model configuration format that allows per-tier provider and API key settings:

#### Model Configuration Structure

```typescript
interface ModelTierConfig {
provider: string; // "openai" | "groq" | "openrouter" | "litellm"
model: string; // Model name (e.g., "gpt-4", "llama-3.1-8b-instant")
api_key: string; // API key for this tier
}

interface ModelConfig {
main_model: ModelTierConfig; // Primary model for complex tasks
mini_model: ModelTierConfig; // Secondary model for simpler tasks
nano_model: ModelTierConfig; // Tertiary model for basic tasks
}
```

#### Example: Evaluation with Model Configuration

```json
{
"jsonrpc": "2.0",
"method": "evaluate",
"params": {
"tool": "chat",
"input": {"message": "Hello"},
"model": {
"main_model": {
"provider": "openai",
"model": "gpt-4",
"api_key": "sk-main-key"
},
"mini_model": {
"provider": "openai",
"model": "gpt-4-mini",
"api_key": "sk-mini-key"
},
"nano_model": {
"provider": "groq",
"model": "llama-3.1-8b-instant",
"api_key": "gsk-nano-key"
}
}
}
}
```

### Dynamic LLM Configuration

The server supports runtime LLM configuration via the `configure_llm` JSON-RPC method:

```json
{
"jsonrpc": "2.0",
"method": "configure_llm",
"params": {
"provider": "openai|groq|openrouter|litellm",
"apiKey": "your-api-key",
"endpoint": "endpoint-url-for-litellm",
"models": {
"main": "main-model-name",
"mini": "mini-model-name",
"nano": "nano-model-name"
},
"partial": false
},
"id": "config-request-id"
}
```
Comment on lines +160 to +181
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Clarify partial update semantics for configure_llm.

Document which fields are merge‑patched when partial: true and server‑side validation/error behavior for missing credentials per provider.


🏁 Script executed:

#!/bin/bash
rg -n -C2 --type=js --type=ts 'handleConfigureLLM|partial\s*[:=]\s*true|validateConfiguration' eval-server/nodejs

Length of output: 811


🏁 Script executed:

#!/bin/bash
sed -n '320,460p' eval-server/nodejs/src/lib/EvalServer.js

Length of output: 3888


Fix partial-update semantics and document configure_llm behavior.

  • In handleConfigureLLM (eval-server/nodejs/src/lib/EvalServer.js — starts at line 384) the code validates provider and models.main before merging; this rejects legitimate partial updates that omit those fields. Change to merge params into the existing config when partial: true, then validate the resulting config (ensure merged.provider ∈ supportedProviders and merged.models.main exists).
  • If a partial update changes provider, validate that required credentials for the new provider are present after merge (use CONFIG.providers[provider] as fallback) and return a clear error listing any missing credentials.
  • Update eval-server/nodejs/CLAUDE.md (configure_llm section) to explicitly state which fields are merge-patched (provider, apiKey, endpoint, models.*) and the server-side validation/error rules for missing credentials per provider.
🤖 Prompt for AI Agents
In eval-server/nodejs/src/lib/EvalServer.js around line 384 and
eval-server/nodejs/CLAUDE.md lines 160–181, change handleConfigureLLM so that
when params.partial === true it first merges the incoming params into the
current saved LLM config (provider, apiKey, endpoint, models.*) and then
validates the merged config (ensure merged.provider is in supportedProviders and
merged.models.main exists); if the merged provider differs from the previous
provider, validate that all required credentials for the new provider are
present using CONFIG.providers[provider] as fallbacks and return a clear error
listing any missing credential keys, and ensure non-partial updates still
require provider and models.main; also update CLAUDE.md configure_llm section to
document that provider, apiKey, endpoint, and models.* are merge-patched on
partial updates and to list the server-side validation and per-provider
missing-credential error rules.


### Configuration

All configuration is managed through environment variables and `src/config.js`. Key settings:
Expand Down
18 changes: 17 additions & 1 deletion eval-server/nodejs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,23 @@ server.onConnect(async client => {
message: "Your question here"
},
timeout: 30000, // Optional timeout (ms)
model: {}, // Optional model config
model: { // Optional nested model config
main_model: {
provider: "openai",
model: "gpt-4",
api_key: "sk-..."
},
mini_model: {
provider: "openai",
model: "gpt-4-mini",
api_key: "sk-..."
},
nano_model: {
provider: "groq",
model: "llama-3.1-8b-instant",
api_key: "gsk-..."
}
},
metadata: { // Optional metadata
tags: ['api', 'test']
}
Expand Down
Loading