Skip to content

robert-mccray/cryptographic-rag-ledger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Cryptographic RAG Provenance Ledger

As enterprises deploy Retrieval-Augmented Generation (RAG) pipelines, the primary attack vector has shifted from the network to the data supply chain. If an attacker injects a poisoned document into the Vector Database (e.g., altering routing numbers or embedding prompt injections), the AI will hallucinate malicious outputs to every user.

This project introduces a Zero-Trust Data Provenance Pipeline. It intercepts raw documents before they are vectorized, generates a deterministic SHA-256 cryptographic signature, and cross-references it against an immutable, compliance-approved ledger.

πŸ“ˆ Business Impact & Enterprise Security

  • Prevents AI Data Poisoning: A single altered character in a 500-page PDF will completely change its cryptographic hash, triggering an immediate Hard Block at the vectorization layer.
  • Audit-Ready Data Lineage: Provides mathematically provable evidence that every document embedded in the RAG database was explicitly approved by compliance, satisfying rigorous SOC2 and HIPAA audit requirements.
  • Secures the AI Supply Chain: Eradicates the risk of shadow IT or compromised CI/CD pipelines silently injecting malicious context into the enterprise LLM.

πŸš€ Quick Start (Local Simulation)

1. Create the mock data files:

mkdir mock_data
echo -n "" > mock_data/clean_financial_policy.pdf
echo -n "malicious_injection" > mock_data/poisoned_financial_policy.pdf

2. Execute the Data Pipeline:

python src/pipeline.py

About

A DevSecOps DevSecOps data pipeline that utilizes SHA-256 cryptographic hashing to mathematically verify document provenance and prevent RAG data poisoning before vectorization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages