Skip to content

Multi Agent Testing

Holger Imbery edited this page Feb 19, 2026 · 2 revisions

Multi-Agent Testing

The application is designed from the ground up to support testing multiple Copilot Studio agents simultaneously.


Concepts

Agent Registry

Each agent is an independent configuration record containing:

  • Direct Line credentials (secret, bot ID, transport preferences)
  • Per-agent Judge settings (endpoint, API key, model, temperature, pass threshold) — or falls back to global defaults
  • Per-agent question generation settings — or falls back to global defaults
  • Environment tag (dev, staging, production)
  • Timeout and retry policies

Suite-Agent Mapping

A test suite can be associated with one or more agents via a many-to-many relationship. Running a suite against multiple agents launches a separate execution per agent, tracked individually.

Execution Coordinator

MultiAgentExecutionCoordinator orchestrates parallel execution:

  • Fetches the suite-agent mappings
  • Spawns a TestExecutionService instance per agent concurrently
  • Enforces per-agent rate limits and concurrency controls
  • Isolates errors — one agent failing does not stop the others
  • Writes a Run record per agent with all results and AgentId attribution

Setting Up Multiple Agents

Via the Web UI

  1. Navigate to Agents in the sidebar
  2. Click New Agent
  3. Fill in the agent's name, environment tag, Direct Line secret, bot ID, and Judge settings
  4. Click Save
  5. Repeat for each additional agent (dev, staging, production, regional variants, etc.)

Associate Agents with a Test Suite

  1. Navigate to Test Suites
  2. Click Edit on a suite
  3. Go to the Agents tab
  4. Check all agents you want the suite to run against
  5. Save

Running Against Multiple Agents

  1. Go to Test Suites
  2. Click Run on your suite
  3. The coordinator runs the suite against all associated agents in parallel
  4. The Dashboard shows one run entry per agent

Comparing Agent Results

The Runs page shows:

Column Description
Suite Test suite name
Agent Which agent was tested
Environment dev / staging / production
Pass Rate Percentage of passed test cases
Avg Latency Mean response time
Status running / completed / failed

Use this to:

  • A/B test two agent configurations side-by-side
  • Cross-environment validation — ensure staging matches production
  • Version comparison — validate a new agent version before promotion

Per-Agent Configuration Resolution

When a test run is executed, AgentConfigurationService resolves settings in this order:

  1. Agent-specific setting (stored on the Agent entity)
  2. Global default (from the global configuration / Settings page)

This means you can configure most agents to use global defaults and only override specific settings (e.g., a different judge model for one agent).


Use Cases

Cross-Environment Testing

Associate a suite with your dev, staging, and production agents. Run once to verify the same behavior across all three.

Suite: "Customer FAQ Regression"
  → Agent: Production Bot    (production)
  → Agent: Staging Bot       (staging)
  → Agent: Dev Bot           (dev)

A/B Testing

Create two agent configurations with different prompts or knowledge bases and run the same suite to compare quality scores.

Regional Deployment

Tag agents by region (e.g., West Europe, East US) and run the same suite to detect regional inconsistencies.

Release Validation

Before promoting a new agent version, run all regression suites against it and compare with the previous version's run history.


Sample Data

The TestDataSeeder can seed sample multi-agent data for demonstrations:

  • Three sample agents: Production, Staging, Development
  • Pre-configured test suites with agent associations

This is useful for exploring the UI without real agent credentials.


Migration from Single-Agent Version

If upgrading from a previous single-agent version of the application:

  1. Database migration is automatic — the schema migrates on first startup; no manual steps required.
  2. Existing test suites can be associated with newly created agents via the Test Suites → Edit → Agents tab.
  3. Global Judge settings are preserved and can be selectively overridden per agent.
  4. API backwards compatibility — agents are optional parameters; existing API calls continue to work and fall back to the global Direct Line configuration.

What Changed

Area Change
Agent entity New table with per-agent Direct Line, Judge, and question generation settings
TestSuiteAgent New many-to-many join table linking suites to agents
Run entity Now records AgentId so results are attributed per agent
MultiAgentExecutionCoordinator New service orchestrating parallel runs across all associated agents
AgentConfigurationService Resolves effective settings: agent-specific → global default
Web UI Agent management pages, updated Setup Wizard, agent selection in suite/run flows

Clone this wiki locally