Production deployment of the PAC-AI protocol with CrewAI agents on AWS.
Multi-agent healthcare, education, recommendation, finance, and hiring scenarios that demonstrate EU AI Act compliance (Annex III 4(a) and 5(b), Articles 5(1)(f)/(g), 13, 14, and 26) through auditable context envelopes, W3C PROV provenance graphs, and cryptographic integrity verification — all persisted on DynamoDB + S3.
TL;DR: This is the production-grade version of the jhcontext compliance scenarios — real CrewAI agents, AWS infrastructure (Chalice Lambda + DynamoDB + S3), and persistent storage. For a lightweight in-memory proof-of-concept with no infrastructure, see jhcontext-usecases.
┌─────────────────────────────┐
│ Agent (local/Lambda) │
│ CrewAI Flows + ContextMixin │
└──────────┬──────────────────┘
│ HTTPS
┌────────────────┼────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ jhcontext-api│ │ jhcontext-mcp│ │ S3 Bucket │
│ (Chalice) │ │ (Chalice) │ │ artifacts │
│ Lambda │ │ Lambda │ └──────────────┘
└───────┬───────┘ └───────┬──────┘
│ │
┌───────┴──────────────────┴──────┐
│ DynamoDB │
│ envelopes · artifacts · prov │
│ decisions (4 tables) │
└─────────────────────────────────┘
Three independent modules, three separate deployments. The agent runs locally and calls the deployed API over HTTPS — keeping Lambda cold start under 2 seconds.
See Architecture for full repository structure and dependency separation.
Each scenario demonstrates a different EU AI Act compliance pattern:
| Scenario | Article | Risk | Agents | Key Proof |
|---|---|---|---|---|
| Healthcare | Art. 14 — Human Oversight | HIGH | 5 (sensor → situation → decision → oversight → audit) | Temporal proof that physician reviewed docs AFTER AI recommendation |
| Education — Fair Grading | Art. 13 — Non-Discrimination | HIGH | 4 (ingestion → grading ╳ equity → audit) | Workflow isolation + negative proof (identity absent from grading) |
| Education — Rubric-Grounded Grading | Annex III §3 — Three-scenario audit | HIGH | 6 (ingestion → scoring → feedback → equity → TA review → audit) | (A) negative proof + isolation, (B) rubric-criterion binding, (C) temporal oversight |
| Education — Oral Feedback (supplementary) | Annex III §3 (multimodal) | HIGH | 6 (audio-ingestion → scoring → feedback → equity → TA review → audit) | Same A/B/C pattern over audio; per-sentence binding to (start_ms, end_ms) audited via verify_multimodal_binding |
| Recommendation | LOW-risk | LOW | 3 (profile → search → personalize) | Full provenance with Raw-Forward policy |
| Finance | Annex III 5(b) — Composite | HIGH | 7 (data → risk → decision → oversight ╳ fair lending → audit) | All 4 patterns: negative proof + temporal oversight + workflow isolation + PII detachment |
| Hiring | Annex III §4(a) + Arts. 5(1)(f)/(g), 13, 14, 26 | HIGH | 6 (sourcing → parsing → screening → interview → ranking → decision-support) + recruiter | Quadripartite Semantic-Forward at every handoff (every task outputs a FlatEnvelope); 7 HR-specific verifiers + cohort 4/5 disparate-impact test |
| Benefits A3I — toeslagenaffaire anchor | GDPR Arts. 13-15 + EU AI Act Arts. 14, 86 | HIGH | 3 (intake → semantic extractor → decision) | Two pipelines side-by-side (Raw-Forward + Semantic-Forward) + four citizen SPARQL queries (integrity, semantic_claims, reasoning_chain, counterfactual) demonstrating what Semantic-Forward enables that Raw-Forward does not. Offline deterministic runner at agent/scenarios/benefits_a3i/simulate.py (no LLM key needed). |
Three additional scenarios exercise PAC-AI under offline/deferred-sync semantics. Envelopes are enqueued into a local SQLite queue during connectivity outages and drained when the uplink returns — with predecessor-hash chain verification, tamper detection, and late-arrival flagging at drain time.
| Scenario | Risk | Agents | Connectivity profile |
|---|---|---|---|
| Rural Cardiac Triage | HIGH (Annex III §5) | 3 (physio-signal → triage → resource-allocation) + teleconsult oversight | Offline during AI pipeline → online for specialist review 10 min later |
| Chronic-Disease Remote Monitoring | HIGH | 4 (sensor-agg → trend → alert → care-plan) + nurse oversight | Offline per daily handoff → opportunistic sync → next-day nurse review |
| CHW Mental-Health Screening | HIGH | 3 (PHQ-9 interview → risk-classifier → referral) + district-specialist oversight | Offline during CHW home visit → online on return to clinic |
See Offline healthcare scenarios for the code mapping, connectivity timelines, and the full list of outputs per run.
Crews are modeled explicitly in the W3C PROV graph using prov:actedOnBehalfOf. The
PROV graph itself serves as the coordination layer — no external pipeline ID needed.
In any flow, call _register_crew() after _init_context():
class MyFlow(Flow, ContextMixin):
@start()
def init(self):
self._init_context(
scope="healthcare",
producer="did:hospital:system",
risk_level=RiskLevel.HIGH,
)
# Agents in the crew get prov:actedOnBehalfOf the crew agent
self._register_crew(
crew_id="crew:clinical-pipeline",
label="Clinical Pipeline Crew",
agent_ids=[
"did:hospital:sensor-agent",
"did:hospital:situation-agent",
"did:hospital:decision-agent",
],
)
# Oversight agent stays outside the crew — explicit boundaryThis produces PROV triples like:
jh:crew-clinical-pipeline a prov:Agent, prov:SoftwareAgent ;
rdfs:label "Clinical Pipeline Crew" ;
jh:agentType "crew" .
<did:hospital:sensor-agent> prov:actedOnBehalfOf jh:crew-clinical-pipeline .Query all activities from a crew via SPARQL:
SELECT ?activity ?label WHERE {
?agent prov:actedOnBehalfOf jh:crew-clinical-pipeline .
?activity prov:wasAssociatedWith ?agent .
?activity rdfs:label ?label .
}- Python 3.10+
- AWS account with credentials configured (
aws configure) jhcontextSDK published to PyPI (or installed from../jhcontext-sdk)
cd jhcontext-crewai/api
pip install -r requirements.txt
python setup_tables.pyThis creates 4 DynamoDB tables (PAY_PER_REQUEST billing) and 1 S3 bucket:
jhcontext-envelopes(PK: context_id, GSI: ScopeIndex)jhcontext-artifacts(PK: artifact_id, GSI: ContextIndex)jhcontext-prov-graphs(PK: context_id)jhcontext-decisions(PK: decision_id, GSI: ContextIndex)jhcontext-artifacts-dev(S3 bucket for large artifact content)
cd jhcontext-crewai/api
./deploy.shNote the API endpoint URL printed at the end.
cd jhcontext-crewai/mcp
./deploy.shcd jhcontext-crewai
pip install -r agent/requirements.txtSet the API URL:
export JHCONTEXT_API_URL=https://{api-id}.execute-api.us-east-1.amazonaws.com/apipython -m agent.run --scenario healthcare
python -m agent.run --scenario education-fair
python -m agent.run --scenario education-rubric
python -m agent.run --scenario education-oral # supplementary multimodal variant
python -m agent.run --scenario recommendation
python -m agent.run --scenario finance
python -m agent.run --scenario allThe hiring crew runs the full six-task multi-agent pipeline with
FlatEnvelope round-tripping at every handoff. With
HIRING_USE_MOCK_LLM=1 it reproduces deterministically without an
ANTHROPIC_API_KEY:
HIRING_USE_MOCK_LLM=1 python -m agent.scenarios.hiring.run_procurement
HIRING_USE_MOCK_LLM=1 python -m agent.scenarios.hiring.run_inflight
HIRING_USE_MOCK_LLM=1 python -m agent.scenarios.hiring.run_cohort
HIRING_USE_MOCK_LLM=1 python -m agent.scenarios.hiring.run_all
python -m agent.scenarios.hiring.render_forwarding_diff # before/after sizes per handoffSee agent/crews/hiring/README.md for the six functional agents, the FlatEnvelope→Envelope→ForwardingEnforcer→ FlatEnvelope round trip per task, and the three audit checkpoints (procurement, in-flight, cohort).
python -m agent.run --local --scenario healthcare
python -m agent.run --local --scenario allAuto-starts a local SQLite server on :8400, runs the scenario, and shuts down. No
second terminal needed. See Local Development for details.
python -m agent.run --validate # validate latest run
python -m agent.run --validate v01 # validate specific runSee Validation for interpreting results, audit checks, and UserML semantic payloads.
The offline healthcare scenarios run via a separate driver that skips the Chalice API and enqueues envelopes into a local SQLite queue during scripted connectivity outages, then drains them against the scripted timeline with chain / tamper / late-arrival verification:
export ANTHROPIC_API_KEY=sk-ant-...
python -m agent.offline_simulate triage # Rural cardiac triage
python -m agent.offline_simulate chronic # Chronic-disease remote monitoring
python -m agent.offline_simulate chw # CHW mental-health screening
python -m agent.offline_simulate allOutputs under output/runs/vNN/:
| File | Description |
|---|---|
<scenario>_envelopes.json |
Per-task envelope snapshots (JSON-LD) |
<scenario>_prov.ttl |
W3C PROV graph (Turtle) |
<scenario>_audit.json |
Programmatic + narrative audit report |
<scenario>_queue.sqlite |
Local offline queue persisted across runs |
<scenario>_sync_log.json |
Drain report (queued / drained / tampered / chain_broken / late) |
<scenario>_upstream_received.json |
What the mock upstream actually received at drain time |
healthcare_offline_summary.json |
Combined summary across the three scenarios |
The simulation driver is in agent/offline_simulate.py; the offline protocol layer (drop-in replacement for ContextMixin) lives in agent/protocol/ — offline_queue.py, sync_manager.py, offline_context_mixin.py, mock_upstream.py. Full detail in Offline healthcare scenarios.
.venv/bin/python -m pytest tests/test_offline_layer.py tests/test_offline_flow_e2e.py -vThe offline protocol layer ships with 6 tests covering clean drain, tamper detection, chain-break detection, late-arrival flagging, and a full mixin→queue→sync end-to-end chain that runs without an Anthropic API key.
| Topic | Description |
|---|---|
| Architecture | System diagram, repository structure, dependency separation |
| API Reference | All API routes with curl examples |
| Forwarding Policy | Semantic-Forward vs Raw-Forward, monotonic enforcement |
| Understanding Run Output | How to read envelopes, PROV graphs, audits, metrics, and validation results |
| Local Development | Running without AWS (SQLite backend) |
| Security | API authentication roadmap (API key → IAM → Cognito → mTLS) |
| Validation | Protocol validation, audit checks, UserML, PROV, metrics |
| Test Suite | Unit tests: storage backend, local mode, ontology validation |
| Crew | Article | Description |
|---|---|---|
| Healthcare | Art. 14 | 5 agents, 3 crews, Semantic-Forward, temporal oversight proof |
| Education — Fair Grading | Art. 13 | 4 agents, 3 isolated flows, workflow isolation + negative proof |
| Education — Rubric-Grounded Grading | Annex III §3 | 6 agents, 4 flows, three-scenario audit (negative proof + rubric grounding + temporal oversight) |
| Recommendation | LOW-risk | 3 agents, 1 crew, Raw-Forward, full provenance |
| Finance | Annex III 5(b) | 7 agents, 4 crews, composite compliance (all 4 patterns) |
| Hiring | Annex III §4(a) + Arts. 5(1)(f)/(g), 13, 14, 26 | 6 agents, 1 crew, Quadripartite Semantic-Forward; 7 HR-specific verifiers + cohort 4/5; mock-LLM offline mode |
| Offline healthcare scenarios | Annex III §5 | Rural triage + chronic monitoring + CHW mental-health, offline-first with scripted connectivity |
Reference figures from the PAC-AI protocol:
Apache 2.0

