-
Notifications
You must be signed in to change notification settings - Fork 0
Multi Agent Testing
The application is designed from the ground up to support testing multiple Copilot Studio agents simultaneously.
Each agent is an independent configuration record containing:
- Direct Line credentials (secret, bot ID, transport preferences)
- Per-agent Judge settings (endpoint, API key, model, temperature, pass threshold) — or falls back to global defaults
- Per-agent question generation settings — or falls back to global defaults
- Environment tag (
dev,staging,production) - Timeout and retry policies
A test suite can be associated with one or more agents via a many-to-many relationship. Running a suite against multiple agents launches a separate execution per agent, tracked individually.
MultiAgentExecutionCoordinator orchestrates parallel execution:
- Fetches the suite-agent mappings
- Spawns a
TestExecutionServiceinstance per agent concurrently - Enforces per-agent rate limits and concurrency controls
- Isolates errors — one agent failing does not stop the others
- Writes a
Runrecord per agent with all results andAgentIdattribution
- Navigate to Agents in the sidebar
- Click New Agent
- Fill in the agent's name, environment tag, Direct Line secret, bot ID, and Judge settings
- Click Save
- Repeat for each additional agent (dev, staging, production, regional variants, etc.)
- Navigate to Test Suites
- Click Edit on a suite
- Go to the Agents tab
- Check all agents you want the suite to run against
- Save
- Go to Test Suites
- Click Run on your suite
- The coordinator runs the suite against all associated agents in parallel
- The Dashboard shows one run entry per agent
The Runs page shows:
| Column | Description |
|---|---|
| Suite | Test suite name |
| Agent | Which agent was tested |
| Environment | dev / staging / production |
| Pass Rate | Percentage of passed test cases |
| Avg Latency | Mean response time |
| Status | running / completed / failed |
Use this to:
- A/B test two agent configurations side-by-side
- Cross-environment validation — ensure staging matches production
- Version comparison — validate a new agent version before promotion
When a test run is executed, AgentConfigurationService resolves settings in this order:
-
Agent-specific setting (stored on the
Agententity) - Global default (from the global configuration / Settings page)
This means you can configure most agents to use global defaults and only override specific settings (e.g., a different judge model for one agent).
Associate a suite with your dev, staging, and production agents. Run once to verify the same behavior across all three.
Suite: "Customer FAQ Regression"
→ Agent: Production Bot (production)
→ Agent: Staging Bot (staging)
→ Agent: Dev Bot (dev)
Create two agent configurations with different prompts or knowledge bases and run the same suite to compare quality scores.
Tag agents by region (e.g., West Europe, East US) and run the same suite to detect regional inconsistencies.
Before promoting a new agent version, run all regression suites against it and compare with the previous version's run history.
The TestDataSeeder can seed sample multi-agent data for demonstrations:
- Three sample agents: Production, Staging, Development
- Pre-configured test suites with agent associations
This is useful for exploring the UI without real agent credentials.
If upgrading from a previous single-agent version of the application:
- Database migration is automatic — the schema migrates on first startup; no manual steps required.
- Existing test suites can be associated with newly created agents via the Test Suites → Edit → Agents tab.
- Global Judge settings are preserved and can be selectively overridden per agent.
- API backwards compatibility — agents are optional parameters; existing API calls continue to work and fall back to the global Direct Line configuration.
| Area | Change |
|---|---|
Agent entity |
New table with per-agent Direct Line, Judge, and question generation settings |
TestSuiteAgent |
New many-to-many join table linking suites to agents |
Run entity |
Now records AgentId so results are attributed per agent |
MultiAgentExecutionCoordinator |
New service orchestrating parallel runs across all associated agents |
AgentConfigurationService |
Resolves effective settings: agent-specific → global default |
| Web UI | Agent management pages, updated Setup Wizard, agent selection in suite/run flows |