Context
The current benchmark generator (benchmarks/generate_benchmark.py) creates 5 agent archetypes: reliable, volatile, new, dormant, and fraudulent. Real-world adversarial agents are more sophisticated than the "fraudulent" archetype.
Task
Add 3 new adversarial archetypes to generate_benchmark.py:
- Slow Drift — Agent gradually shifts behavior over 30+ days to avoid anomaly detection. Small incremental changes in price, timing, counterparty concentration.
- Burst Manipulation — Agent behaves normally for weeks, then executes a rapid burst of anomalous transactions in a short window (<1 hour).
- Sybil Coordinator — Multiple agents that appear independent but coordinate to manipulate trust scores (e.g., trading with each other to inflate metrics).
Acceptance Criteria
Skills needed
Python, statistical modeling, basic understanding of anomaly detection.
Context
The current benchmark generator (
benchmarks/generate_benchmark.py) creates 5 agent archetypes: reliable, volatile, new, dormant, and fraudulent. Real-world adversarial agents are more sophisticated than the "fraudulent" archetype.Task
Add 3 new adversarial archetypes to
generate_benchmark.py:Acceptance Criteria
generate_benchmark.pyevaluate.pycorrectly classifies these as anomalous (run and report accuracy)Skills needed
Python, statistical modeling, basic understanding of anomaly detection.