AI-Powered PII Detection and Document Redaction
A production-ready document redaction application that automatically detects and masks personally identifiable information (PII) using a combination of pattern matching and AI-powered detection.
The Smart Redaction Tool helps users protect sensitive information in documents by:
- Uploading documents - Supports PDF, DOCX, TXT, and image files
- Detecting PII - Uses regex patterns and LLM-based detection to find sensitive data
- Reviewing redactions - Interactive preview with manual redaction capabilities
- Exporting - Download redacted documents as PDF, DOCX, or TXT
- Social Security Numbers (SSN)
- Credit Card Numbers
- Bank Account Numbers
- Personal Names
- Physical Addresses
- Phone Numbers
- Email Addresses
- Dates of Birth
- Framework: Next.js 16 (App Router)
- Language: TypeScript
- Styling: Tailwind CSS 4 + Shadcn UI
- Fonts: Inter (body), Instrument Serif (headings), JetBrains Mono (code)
- Icons: Phosphor Icons
- Package Manager: Bun
- AI/OCR: Case.dev SDK
- File Storage: Vercel Blob
- Usage Tracking: localStorage (client-side session isolation)
git clone <repository-url>
cd redaction-tool-demo
bun installCopy the example environment file:
cp .env.example .env.localConfigure your environment variables:
# Case.dev SDK (required for LLM detection and OCR)
CASEDEV_API_KEY=sk_case_...
CASEDEV_API_URL=https://api.case.dev
# Vercel Blob (required for PDF/DOCX/Image processing)
BLOB_READ_WRITE_TOKEN=vercel_blob_rw_...
# Demo Limits
DEMO_SESSION_HOURS=24
DEMO_SESSION_PRICE_LIMIT=5bun devOpen http://localhost:3000 to access the application.
The application uses the Case.dev API for LLM-based PII detection and OCR processing. Usage is metered based on the following rates:
| Metric | Cost |
|---|---|
| Input Tokens | $3.00 per 1 million tokens |
| Output Tokens | $15.00 per 1 million tokens |
| Metric | Cost |
|---|---|
| Per Page | $0.02 per page |
| Operation | Typical Cost |
|---|---|
| 1-page PDF scan | ~$0.02 |
| PII detection (1000 words) | ~$0.01 |
| Full document processing | ~$0.05 - $0.15 |
The application enforces usage limits for demo/trial access:
| Limit | Value |
|---|---|
| Session Duration | 24 hours |
| Cost Limit | $5.00 USD |
- Usage is tracked across all API operations (LLM detection + OCR)
- A progress banner displays current usage at 50%, 75%, and 90% thresholds
- When the $5 limit is reached, API operations are blocked
To continue using the application beyond demo limits, create a free account at console.case.dev. Registered users receive:
- Unlimited document processing
- Advanced AI detection features
- Full export capabilities
- Team collaboration tools
├── app/
│ ├── api/
│ │ ├── detect-pii/ # PII detection endpoint
│ │ ├── extract/ # Document text extraction (OCR)
│ │ └── export-pdf/ # PDF export generation
│ ├── dashboard/ # Main redaction workflow
│ └── page.tsx # Landing page
├── components/
│ ├── demo/ # Usage tracking UI
│ ├── redaction/ # Core redaction components
│ └── ui/ # Shadcn UI primitives
├── lib/
│ ├── contexts/ # React contexts (usage tracking)
│ ├── export/ # Document export utilities
│ ├── redaction/ # Detection logic
│ └── usage/ # Usage tracking module
├── types/ # TypeScript definitions
└── skills/ # AI agent documentation
The application uses a multi-pass detection approach:
- Regex Patterns - Fast, high-precision pattern matching for structured data (SSN, credit cards, etc.)
- LLM Detection - AI-powered analysis for contextual PII (names, addresses)
- Retrospective Pass - Second-pass review to catch missed items based on initial findings
Redacted documents can be exported in three formats:
- PDF - Full document with redactions rendered as black boxes
- DOCX - Microsoft Word format with redacted text replaced
- TXT - Plain text with redactions applied
This project is licensed under the Apache 2.0 License.
Built with Case.dev - The Legal AI Platform