The Self-Evolving Agent Ecosystem — Trading agents that evolve through Darwinian selection and adversarial self-play
-
Updated
Apr 9, 2026 - Python
The Self-Evolving Agent Ecosystem — Trading agents that evolve through Darwinian selection and adversarial self-play
bili-core is an open-source framework for LLM benchmarking using LangChain, LangGraph, Streamlit, and Flask. It enables effective LLM model comparisons, Retrieval-Augmented Generation (RAG), and customizable decision workflows. Part of MSU Denver’s Sustainability Hub, bili-core promotes data democracy and transparent, reproducible AI research. 🚀
Elenchus MCP Server - Adversarial verification system for code review
A marketplace of Claude Code plugins for adversarial security and architectural code review.
Official cli of humanbound platform.
AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation
Agent-driven adversarial paper audit framework
Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.
Context engineering toolkit for LLMs — pack, cache, debug, red-team, and orchestrate context windows. Council of Experts, adversarial testing, immune system, context compiler, drift detection, multi-agent entanglement. TypeScript + Python.
Mechanism-grounded taxonomy of 40 LLM jailbreak patterns across 10 categories. Full evaluation harness for 4 frontier models. AI safety research with responsible disclosure.
Systematic exploration of LLM alignment boundaries through logical stress testing
Adversarial eval harness for any LLM agent pipeline — Claude, OpenAI, or your own. CLI + REST API + MCP server for Cursor/Antigravity.
9-stage enterprise development pipeline for Claude Code. TDD, adversarial testing, mechanical verification. Any stack.
Adversarial testing of LLMs on constraint satisfaction deadlocks
Adversarial MCP server benchmark suite for testing tool-calling security, drift detection, and proxy defenses
Multi-agent adversarial API testing CLI. 3 agents, 3 oracle layers, every bug gets a curl command.
Red team toolkit for stress-testing MCP security scanners — find detection gaps before attackers do
Identified critical AI governance gaps: no adversarial testing, undocumented third-party models, and missing incident response. Delivered roadmap to secure high-risk KYC and transaction monitoring systems against evolving threats.
Description URF Application Stress Test — adversarial and scalability tests for Unified Rigidity Framework applications, validating limits under load, noise, and edge cases.
Add a description, image, and links to the adversarial-testing topic page so that developers can more easily learn about it.
To associate your repository with the adversarial-testing topic, visit your repo's landing page and select "manage topics."