Skip to content

codessian/epistemic-confidence-layer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Epistemic Confidence Layer (ECL)

CI Docs License Python

TLS for Knowledge. A model-agnostic trust protocol that turns fluent AI into calibrated, auditable systems.

Goal: When an AI says "80% confident," it's correct ~80% of the time (ECE ≤ 0.10).

Why ECL

  • Hallucinations happen. ECL decomposes outputs into atomic claims, checks cross-model agreement, scores evidence, recency, stability, and language integrity, then returns a calibrated Epistemic Confidence Score (ECS).
  • Model-agnostic. Works across GPT/Claude/Gemini/local LLMs; includes a stub mode for offline dev.
  • Auditable provenance. W3C-PROV style graph with hashes & timestamps.

Quickstart (60 seconds)

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt -r requirements-dev.txt
uvicorn src.ecl.server.app:app --reload

▶️ Try the API: Open http://127.0.0.1:8000/docs for interactive Swagger UI

Then test with curl:

curl -X POST http://127.0.0.1:8000/verify \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Is Knysna in the Western Cape of South Africa?", "models":["stub:gpt","stub:claude"]}'

Features

  • Semantic equivalence via embeddings (Sentence-Transformers by default; OpenAI optional)
  • Isotonic and Platt calibration baselines

Real Providers Setup

For production use with actual model providers:

# Copy environment template
cp .env.example .env

# Add your API keys
ECL_OPENAI_API_KEY=your_openai_key
ECL_ANTHROPIC_API_KEY=your_anthropic_key
ECL_EMBED_BACKEND=sentence-transformers  # or "openai"
ECL_ST_MODEL=all-MiniLM-L6-v2
ECL_OPENAI_EMBED_MODEL=text-embedding-3-small

Local Models: ECL supports Ollama for offline development:

# Install Ollama, then:
ECL_LOCAL_MODEL=ollama:llama3.1:8b
ECL_OLLAMA_BASE_URL=http://localhost:11434

Response (stubbed):

{
  "claims": [
    {"id":"c_1","text":"Knysna is in the Western Cape of South Africa.","hash":"...","provenance":{"source":"extraction:heuristic"}}
  ],
  "consensus": [
    {"claim_id":"c_1","agreement_score":0.95,"diversity_score":0.60,"evidence":[],"recency":1.0,"stability":0.9,"language_integrity":0.95,"ecs":0.88}
  ]
}

Architecture

Prompt → Claim Extraction → Cross-Model Comparison → Contradiction Detection
      → Calibration (ECE/Brier) → Guardrailed Synthesis → API/Dashboard

Calibration Target

  • ECE ≤ 0.10 on the project's evaluation harness (see benchmarks/).
  • CI fails if calibration regresses on toy suite.

Development

pip install -r requirements.txt -r requirements-dev.txt
pre-commit install
ruff check .
mypy src
pytest -q

Shortcuts:

make demo      # run server & minimal example
make evaluate  # run toy benchmark & plot reliability

Governance & Security

Roadmap

See ROADMAP.md.

Bots

Automated assistants keep quality high:

  • ECL-Verify Bot runs tests/eval on PRs and posts ECE results.

  • ECL-Triage Bot labels issues and welcomes new contributors. See docs/bots.md.

  • Nightly Reliability: CI appends ECE trend and uploads artifacts. See Operations → Nightly Reliability.

  • Release Notes: auto-drafted from Conventional Commits (see Operations).

  • Security: OSSF Scorecard runs weekly; see repo Security tab.

Community

  • Ask questions in Discussions.
  • Issues welcome—start with good first issue.

About

Model-agnostic trust protocol for calibrated, auditable AI

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published

Languages