Skip to content

intertwine/security-verifiers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

169 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Security Verifiers

CI License: MIT

A composable suite of security and alignment RL environments with executable, verifiable rewards. Built for Prime Intellect's Verifiers framework.

Vision

Security Verifiers demonstrates how executable rewards can advance both security and alignment research. Rather than relying on LLM-as-judge, our environments use real tools (OPA, Semgrep, test suites) to verify agent behavior, producing rewards that are:

  • Executable: Rewards come from running actual security tools
  • Calibrated: Agents are rewarded for well-calibrated confidence
  • Cost-aware: Asymmetric penalties reflect real operational costs (missing malware >> false alarms)
  • Composable: Shared schemas and tools enable transfer across tasks

Environments

Environment Type Task Status
E1: network-logs SingleTurn Anomaly detection with calibration & abstention Production
E2: config-verification ToolEnv Security auditing with OPA/KubeLinter/Semgrep Production
E3: code-vulnerability ToolEnv Vulnerability detection and repair Alpha (Q1)
E4: phishing-detection SingleTurn Phishing classification with evidence Alpha (Q1)
E5: redteam-attack MultiTurn Red team attack scenarios Alpha (Q1)
E6: redteam-defense MultiTurn Red team defense scenarios Alpha (Q1)

Quick Start

# Setup
make setup && source .venv/bin/activate

# Configure API keys
cp .env.example .env  # Edit with your OPENAI_API_KEY
set -a && source .env && set +a

# Run your first evaluation
make eval-e1 MODELS="gpt-5-mini" N=10

See docs/getting-started.md for detailed setup instructions.

Evaluation

# E1: Network log anomaly detection
make eval-e1 MODELS="gpt-5-mini,gpt-4.1-mini" N=100

# E2: Configuration verification (multi-turn with tools)
make eval-e2 MODELS="gpt-5-mini" N=10 INCLUDE_TOOLS=true

# Generate metrics reports
make report-network-logs
make report-config-verification

Results are written to outputs/evals/<env>--<model>/<run_id>/.

Hub Deployment

Deploy environments to Prime Intellect's Environments Hub:

make hub-deploy E=network-logs
vf-eval your-org/sv-env-network-logs --model gpt-5-mini --num-examples 10

See docs/hub-deployment.md for complete deployment guide.

Project Structure

security-verifiers/
├── environments/       # E1-E6 environment packages
├── sv_shared/          # Shared parsers, rewards, utilities
├── scripts/            # Evaluation and data building scripts
├── docs/               # Documentation
├── plans/              # Roadmap and productionization plans
└── outputs/            # Evaluation results

Documentation

Document Description
Getting Started Installation and first evaluation
Development Guide Contributing, testing, CI
Hub Deployment Deploy to Prime Intellect Hub
Prime Lab Integration Hosted RL training and evaluation
Datasets Guide Dataset access and management
Logging Guide Weave tracing configuration
CLAUDE.md Agent/LLM instructions

Baselines

Run quick baselines on the public mini sets:

make baseline-e1 MODEL="gpt-5-mini"
make baseline-e2 MODEL="gpt-5-mini" INCLUDE_TOOLS=true

Scoreboards are written to bench/scoreboards/.

Prime Lab Integration

Prime Lab integration infrastructure is complete in v0.3.0 (WP2.5/WP2.5a), with a hosted-first path and a validated fallback path:

# Check platform compatibility first
make lab-check

# Hosted training/eval (when lab compatibility + access are available)
make lab-run-e1 MODEL=Qwen/Qwen3-4B-Instruct-2507 TEAM=your-team
make lab-run-e2 MODEL=Qwen/Qwen3-4B-Instruct-2507 TEAM=your-team
make lab-eval-e1 MODEL=Qwen/Qwen3-4B-Instruct-2507 TEAM=your-team
make lab-eval-e2 MODEL=Qwen/Qwen3-4B-Instruct-2507 TEAM=your-team

# Fallback: hosted-style eval via prime env (keeps report/metadata parity)
make env-eval-e1 MODEL=Qwen/Qwen3-4B-Instruct-2507 TEAM=your-team N=100
make env-eval-e2 MODEL=Qwen/Qwen3-4B-Instruct-2507 TEAM=your-team N=50

Replace your-team with your Prime Intellect team slug (from prime auth status).

See docs/PRIME-LAB-INTEGRATION.md for the full integration guide.

Roadmap

See plans/ROADMAP-Q1-2026.md for current development priorities:

Work Package Description Status
WP0 Benchmark integrity hardening Complete
WP1 Metrics contracts and report generator Complete
WP2 Baselines and public mini sets Complete
WP2.5 Prime Lab integration (v0.3.0) Complete
WP2.5a Hosted-eval fallback parity Complete
WP3a/WP3b Hosted RL proof on E1 and E2 Next
WP3c Reward-source comparator (executable vs LLM-judge) Next
WP4 Multi-reward RL stability research Planned
WP5 SV-Bench v0.1 release + technical report Planned

Contributing

See CONTRIBUTING.md for contribution guidelines.

License

MIT License - see LICENSE for details.

About

Reinforcement Learning Verifiers for Cybersecurity

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages