real-time ai safety governance โข anthropic rsp implementation โข enterprise controls
quick start โข architecture โข deployment โข live demo
the claude governance control plane (cgcp) transforms anthropic's responsible scaling policy from paper into production-ready controls. it monitors every claude interaction in real-time, enforces tier-based policies, and generates compliance evidence automatically.
- real-time risk detection - identifies cbrn, self-harm, jailbreak, and exploitation risks in <100ms
- tier-based enforcement - different thresholds for general (0.15), enterprise (0.18), and research (0.25) users
- asl-3 monitoring - triggers at biological (20%), cyber (50%), and deception (50%) capability thresholds
- human-in-the-loop - escalation workflow with 24-hour sla for high-risk events
- compliance automation - generates iso 42001, nist ai rmf, and eu ai act evidence instantly
metric | before cgcp | with cgcp | improvement |
---|---|---|---|
incident response | 24+ hours | <1 hour | 96% faster |
compliance reporting | 2-4 weeks | <5 minutes | 99.9% faster |
policy consistency | manual | automated | 100% coverage |
risk monitoring | quarterly | real-time | continuous |
# required
python 3.8+
4gb ram
git
# optional (for production)
docker
kubernetes
# clone and deploy
git clone https://github.com/dipampaul17/cgcp.git
cd cgcp
python deploy.py local
# run the complete demo with synthetic data
python demo/run_complete_demo.py
this will:
- start all services automatically
- generate realistic enterprise scenarios
- demonstrate risk detection and policy enforcement
- show compliance reporting
- keep services running for exploration
- ๐ dashboard โ http://localhost:8501
- ๐ง api โ http://localhost:8000
- ๐ api docs โ http://localhost:8000/docs
graph TB
subgraph "claude interactions"
A1[web interface]
A2[api calls]
A3[aws bedrock]
A4[applications]
end
subgraph "ingestion layer"
B[ingestion api<br/>fastapi โข port 8000]
end
subgraph "processing layer"
C1[risk detection engine<br/>โข cbrn detector<br/>โข self-harm detector<br/>โข jailbreak detector<br/>โข exploitation detector]
C2[policy engine<br/>โข tier logic<br/>โข asl triggers<br/>โข enforcement<br/>โข escalation]
end
subgraph "data layer"
D[duckdb<br/>fast โข embedded โข sql]
end
subgraph "output layer"
E1[dashboard<br/>streamlit โข 8501]
E2[compliance api<br/>iso/nist/eu exports]
end
A1 & A2 & A3 & A4 --> B
B --> C1 & C2
C1 & C2 --> D
D --> E1 & E2
flowchart LR
subgraph "event input"
E[claude event<br/>prompt + completion]
end
subgraph "risk analysis"
R1[cbrn tagger<br/>biological/chemical/nuclear]
R2[self-harm tagger<br/>mental health risks]
R3[jailbreak tagger<br/>safety bypasses]
R4[exploitation tagger<br/>malicious use]
end
subgraph "scoring"
S[risk scores<br/>0.0 - 1.0 confidence]
end
subgraph "policy decision"
P1{tier<br/>thresholds}
P2{asl-3<br/>triggers}
end
subgraph "actions"
A1[allow]
A2[block]
A3[redact]
A4[escalate]
end
E --> R1 & R2 & R3 & R4
R1 & R2 & R3 & R4 --> S
S --> P1 & P2
P1 --> A1 & A2 & A3
P2 --> A4
graph TD
subgraph "access tiers"
T1[general users<br/>claude.ai public]
T2[enterprise<br/>verified organizations]
T3[research sandbox<br/>safety researchers]
end
subgraph "thresholds"
TH1[cbrn: 0.15<br/>strict safety]
TH2[cbrn: 0.18<br/>balanced]
TH3[cbrn: 0.25<br/>controlled testing]
end
subgraph "example query"
Q[viral vector design<br/>risk score: 0.20]
end
subgraph "outcomes"
O1[blocked โ]
O2[escalated โก]
O3[allowed โ
<br/>with logging]
end
T1 --> TH1
T2 --> TH2
T3 --> TH3
Q --> TH1 --> O1
Q --> TH2 --> O2
Q --> TH3 --> O3
# quick start
python deploy.py local
# manual setup
python -m venv venv
source venv/bin/activate # windows: venv\Scripts\activate
pip install -r requirements.txt
./start.sh
# automated deployment
python deploy.py docker
# or build manually
docker build -t cgcp:latest .
docker run -p 8000:8000 -p 8501:8501 cgcp:latest
python deploy.py docker-compose
# includes:
# - cgcp application
# - prometheus metrics
# - grafana dashboards
# automated deployment
python deploy.py kubernetes
# verify deployment
kubectl get pods -l app=cgcp
# aws ecs
python deploy.py aws
# google cloud run
python deploy.py gcp
# azure container instances
python deploy.py azure
see deployment guide for detailed production setup.
python demo/production_demo.py
this demonstrates:
- enterprise baseline - 500+ normal business queries
- capability evaluation - quarterly asl-3 threshold testing
- incident response - biotech escalation, security research, jailbreaks
- tier enforcement - same query, different responses by access level
- compliance export - automated iso 42001 evidence generation
sequenceDiagram
participant U as user
participant D as demo script
participant A as api
participant DB as database
participant UI as dashboard
U->>D: run demo
D->>A: generate baseline traffic
A->>DB: store 500 events
A-->>UI: update metrics
D->>A: capability evaluation
Note over A: test asl-3 thresholds
A->>DB: log triggers
A-->>UI: show alerts
D->>A: incident scenarios
A->>DB: escalate events
A-->>UI: review queue
D->>A: tier comparison
Note over A: same query, 3 tiers
A-->>U: show differential response
D->>A: compliance export
A->>DB: aggregate evidence
A-->>U: iso 42001 report
# generate realistic test data
python data/synthetic_generator.py
# creates 2000+ events including:
# - pharma research patterns
# - ai safety evaluations
# - financial services queries
# - jailbreak attempts
POST /ingest
{
"events": [{
"event_id": "uuid",
"timestamp": "2024-01-15T10:00:00Z",
"user_id": "user_12345",
"org_id": "acme_pharma",
"surface": "api",
"tier": "enterprise",
"prompt": "user input text",
"completion": "claude response",
"model_version": "claude-3-sonnet"
}]
}
# response
{
"processed": 1,
"actions": {
"allow": 1,
"block": 0,
"redact": 0,
"escalate": 0
},
"asl_triggers": 0
}
GET /metrics
{
"total_events": 125000,
"events_by_surface": {"api": 100000, "claude_web": 25000},
"events_by_tier": {"general": 50000, "enterprise": 70000},
"risk_detections": {"cbrn": 150, "self_harm": 89},
"actions_taken": {"allow": 124500, "block": 300},
"asl_triggers": 12
}
GET /export/iso-evidence?days=30
{
"report_date": "2024-01-15T10:00:00Z",
"period_days": 30,
"summary": {
"total_events": 500000,
"blocked_events": 1250,
"asl_triggers": 45,
"compliance_rate": "99.75%"
},
"controls": [{
"control_id": "iso_9.2.1",
"control_name": "user access management",
"evidence_count": 125000
}]
}
- real-time processing metrics
- risk category distribution
- tier usage patterns
- response time monitoring
- current thresholds by tier
- escalated events queue
- decision audit trail
- sla tracking
- time series trends
- risk pattern analysis
- organization insights
- model comparison
- automated reporting
- framework mapping
- audit downloads
- control effectiveness
# api settings
API_HOST=0.0.0.0
API_PORT=8000
API_WORKERS=4
# database
DATABASE_PATH=/data/governance.db
DATABASE_BACKUP_ENABLED=true
# monitoring
METRICS_ENABLED=true
PROMETHEUS_PORT=9090
# security
ENABLE_AUTH=true
JWT_SECRET_KEY=your-secret-key
edit policy/policy_map.yaml
:
risk_thresholds:
cbrn:
general: 0.15
enterprise: 0.18
research_sandbox: 0.25
grafana dashboards in monitoring/dashboards/
:
- system overview
- risk detection rates
- policy enforcement
- resource usage
python verify_system.py
# tests:
# โ api connectivity
# โ risk detection accuracy
# โ policy enforcement logic
# โ asl trigger thresholds
# โ compliance generation
# generate test load
python data/synthetic_generator.py
python demo/ingest_data.py
# verify performance
# - <100ms response time
# - 1000+ events/second
# - accurate risk scoring
port conflicts
# free up ports
lsof -ti:8000 | xargs kill -9
lsof -ti:8501 | xargs kill -9
database locked
# reset database
rm governance.db.wal
python demo/reset_database.py
slow performance
- enable database indexing
- increase worker processes
- add redis caching
export LOG_LEVEL=DEBUG
python -m uvicorn backend.app:app --log-level debug
we welcome contributions to make ai governance better:
- fork the repository
- create a feature branch
- commit with clear messages
- test thoroughly
- submit a pull request
focus areas:
- risk detection improvements
- dashboard visualizations
- compliance frameworks
- performance optimization
mit license - see LICENSE
built to operationalize anthropic's responsible scaling policy into verifiable enterprise controls. special thanks to the ai safety community for guidance on risk thresholds and evaluation methodologies.
making ai safety operational, one policy at a time
report issue โข documentation โข discussions