Skip to content

DeepExtrema/Donna

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

100 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Donna - Autonomous Invoice Fraud Detection & Verification

Donna is an AI-powered system that automatically monitors Gmail inboxes for invoice and billing emails, detects potential fraud, and verifies suspicious invoices by making intelligent phone calls to the companies that issued them.

🎯 What Donna Does

Donna protects individuals and businesses from invoice fraud by:

  1. Monitoring Gmail - Automatically scans incoming emails for invoices, bills, and receipts
  2. Fraud Detection - Uses AI to analyze domain legitimacy, company information, and billing patterns
  3. Online Verification - Searches Google to verify company details (phone, address, website)
  4. Intelligent Calling - Makes automated phone calls via ElevenLabs AI to verify suspicious invoices
  5. Comprehensive Logging - Records all decisions and verification attempts for audit trails

πŸš€ Key Features

Email Processing

  • Gmail Integration - OAuth-based access to user's Gmail inbox
  • Intelligent Filtering - Identifies invoice, bill, and receipt emails using AI classification
  • Attachment Parsing - Extracts data from PDF invoices and attachments
  • Real-time Monitoring - Gmail push notifications via Pub/Sub for instant processing

Fraud Detection

  • Domain Analysis - Checks for suspicious domains, typosquatting, and homograph attacks
  • Company Verification - Validates against whitelisted company database
  • Google Search Integration - Finds and verifies company information online
  • Confidence Scoring - Assigns confidence levels to verification results

Automated Verification

  • AI Voice Agent - ElevenLabs conversational AI makes verification calls
  • Dynamic Context - Injects user and invoice details into call scripts
  • Twilio Integration - Reliable phone call delivery and recording
  • Call Transcripts - Maintains records of all verification conversations

Dashboard & Monitoring

  • Next.js Web App - Modern React-based user interface
  • Real-time Updates - Live fraud detection results
  • Audit Logs - Complete history of all verification decisions
  • Company Profiles - Visual display of verified billers with logos

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         User's Gmail                         β”‚
β”‚                    (Invoices & Bills)                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Gmail Watch   β”‚
                    β”‚  (Push Notifications) β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     FastAPI Backend                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Email Processing                                     β”‚  β”‚
β”‚  β”‚  - Invoice Extraction (Gemini AI)                    β”‚  β”‚
β”‚  β”‚  - Attachment Parsing (PDF, images)                  β”‚  β”‚
β”‚  β”‚  - Biller Profile Extraction                         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Fraud Detection Engine                              β”‚  β”‚
β”‚  β”‚  - Domain Legitimacy Checker                         β”‚  β”‚
β”‚  β”‚  - Company Database Verification                     β”‚  β”‚
β”‚  β”‚  - Google Search Integration                         β”‚  β”‚
β”‚  β”‚  - ML-based Email Classification                     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Verification Agent                                   β”‚  β”‚
β”‚  β”‚  - ElevenLabs AI Agent                               β”‚  β”‚
β”‚  β”‚  - Twilio Call Orchestration                         β”‚  β”‚
β”‚  β”‚  - Dynamic Variable Injection                        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Supabase Database                       β”‚
β”‚  - User Profiles                                            β”‚
β”‚  - Company Whitelist                                        β”‚
β”‚  - Fraud Detection Logs                                     β”‚
β”‚  - OAuth Tokens                                             β”‚
β”‚  - Gmail Watch Subscriptions                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       Next.js Frontend                       β”‚
β”‚  - User Dashboard                                           β”‚
β”‚  - Company Profiles View                                    β”‚
β”‚  - Fraud Alert Monitoring                                   β”‚
β”‚  - OAuth Authentication                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Technology Stack

Backend (FastAPI)

  • FastAPI - Modern Python web framework
  • Pydantic - Data validation and settings management
  • Supabase - PostgreSQL database and authentication
  • Google APIs - Gmail, Google Custom Search, Gemini AI
  • ElevenLabs - Conversational AI for phone calls
  • Twilio - Phone call infrastructure
  • scikit-learn - Machine learning for email classification

Frontend (Next.js)

  • Next.js 15 - React framework with App Router
  • TypeScript - Type-safe development
  • Tailwind CSS - Utility-first styling
  • Radix UI - Accessible component primitives
  • Supabase SSR - Server-side rendering with Supabase
  • Recharts - Data visualization

Infrastructure

  • Supabase - Database, Auth, and Real-time subscriptions
  • Google Cloud - Gmail API, Pub/Sub, Search API
  • ElevenLabs - AI voice agent platform
  • Twilio - Telephony infrastructure

πŸ“‹ Prerequisites

  • Python 3.10+
  • Node.js 18+
  • Supabase account
  • Google Cloud Platform account (with Gmail API and Custom Search enabled)
  • ElevenLabs account (for AI calling)
  • Twilio account (for phone infrastructure)

πŸ”§ Installation & Setup

1. Clone the Repository

git clone https://github.com/DeepExtrema/Donna.git
cd Donna

2. Backend Setup

cd api
pip install -r requirements.txt

Create .env file in the api directory:

# Supabase
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_KEY=your_supabase_service_key

# API Authentication
API_TOKEN=your_api_token

# Google OAuth
GOOGLE_CLIENT_ID=your_google_client_id
GOOGLE_CLIENT_SECRET=your_google_client_secret

# Google Custom Search
GOOGLE_SEARCH_API_KEY=your_google_search_api_key
GOOGLE_SEARCH_ENGINE_ID=your_search_engine_id

# Gemini AI
GEMINI_API_KEY=your_gemini_api_key

# ElevenLabs
ELEVENLABS_API_KEY=your_elevenlabs_api_key

# Twilio
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_PHONE_NUMBER=your_twilio_phone_number

Optional environment variables (used by specific services with defaults):

# ElevenLabs Agent Configuration (used by conversational router)
ELEVENLABS_AGENT_ID=agent_2601k6rm4bjae2z9amfm5w1y6aps  # Default agent ID
ELEVENLABS_PHONE_NUMBER_ID=phnum_4801k6sa89eqfpnsfjsxbr40phen  # Default phone ID

3. Frontend Setup

cd webapp
npm install

Create .env.local file in the webapp directory:

NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
NEXT_PUBLIC_API_URL=http://localhost:8000

4. Database Setup

Run the necessary Supabase migrations to create tables:

  • profiles - User profiles with company information
  • companies - Whitelisted company database
  • email_fraud_logs - Fraud detection audit logs
  • gmail_watch_subscriptions - Gmail push notification subscriptions

πŸš€ Running the Application

Start Backend

cd api
uvicorn main:app --reload --host 0.0.0.0 --port 8000

The API will be available at http://localhost:8000

API Documentation: http://localhost:8000/docs

Start Frontend

cd webapp
npm run dev

The web app will be available at http://localhost:3000

πŸ“š API Endpoints

Health & Authentication

  • GET /health - Health check
  • GET / - Root endpoint

OAuth & Gmail

  • POST /oauth/store - Store OAuth tokens for a user
  • POST /oauth/webhook/supabase - Supabase OAuth webhook handler
  • POST /emails/fetch - Fetch user's invoice emails
  • POST /gmail/watch/setup - Setup Gmail push notifications for a user
  • POST /pubsub/gmail/push - Gmail push notification webhook from Google Pub/Sub

Fraud Detection

  • POST /fraud/analyze - Analyze single email for fraud
  • POST /fraud/analyze-batch - Batch analyze multiple emails
  • POST /fraud/verify-online - Verify company online via Google Search
  • POST /fraud/analyze-domain - Analyze domain legitimacy

Phone Verification

  • POST /call/conversational - Initiate AI verification call

🎯 How It Works

1. Email Monitoring

# User authenticates with Gmail OAuth
# Backend subscribes to Gmail push notifications
# New invoice emails trigger instant processing

2. Fraud Detection Pipeline

Incoming Email
    ↓
AI Classification (Bill/Receipt/Other)
    ↓
Domain Legitimacy Check
    ↓
Company Database Verification
    ↓
[Not Found] β†’ Google Search
    ↓
[Phone Found + Low Confidence] β†’ AI Phone Call
    ↓
Decision: LEGIT / FRAUD / CALL / PENDING
    ↓
Log to Database + Notify User

3. AI Verification Call

When a suspicious invoice is detected:

  1. Google Search finds the company's phone number
  2. ElevenLabs Agent is configured with:
    • Company name and contact info
    • User's details (from profiles table)
    • Invoice information (amount, date, etc.)
  3. Call is initiated via Twilio
  4. Conversation is recorded and transcribed
  5. Result is logged for audit

Example Call Script

Donna: "Hi, this is Donna calling on behalf of John Smith from Acme Corp. 
        I'm helping them verify an invoice email they received from your 
        company at billing@company.com. Is this the right department?"

Agent: "Yes, this is billing."

Donna: "Great! John received invoice #12345 for $150.50 dated October 5th. 
        Can you confirm this invoice was sent by your company?"

[Verification continues...]

πŸ” Security & Privacy

Data Protection

  • OAuth 2.0 - Secure Gmail access with user consent
  • Token Encryption - Refresh tokens stored securely in Supabase
  • PII Minimization - Only necessary data is stored
  • Audit Logging - Complete trail of all verification activities

Compliance

  • GDPR Compliant - User data handling and retention policies
  • Call Recording Consent - Disclosure at start of every call
  • Data Retention - Configurable retention periods
  • No Payment Data - No credit card or banking information stored

API Security

  • API Token Authentication - Required for all protected endpoints
  • CORS Protection - Restricted origins
  • Rate Limiting - Protection against abuse

πŸ“Š Fraud Detection Logic

Verification Status Types

Status Meaning Action
legit Company verified in database or high-confidence online match βœ… Safe to pay
fraud Suspicious domain or failed verification β›” Block payment
call Phone verification initiated πŸ“ž Waiting for call result
pending Insufficient data for decision ⏳ Human review needed

Confidence Scoring

  • β‰₯ 0.8 - High confidence (phone + address + email match)
  • 0.5 - 0.8 - Medium confidence (phone found, triggers call)
  • < 0.5 - Low confidence (insufficient data, marked pending)

πŸ§ͺ Testing

Test Fraud Detection

cd api
python test_fraud_pipeline.py

Test ElevenLabs Integration

python test_integration.py "Shopify"

Test Real Phone Call

python test_real_call.py

Test Company Verification

python test_company_verification.py

πŸ“ Project Structure

Donna/
β”œβ”€β”€ api/                          # FastAPI Backend
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ routers/             # API routes
β”‚   β”‚   β”‚   β”œβ”€β”€ emails.py        # Email fetching endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ fraud.py         # Fraud detection endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ oauth.py         # OAuth handlers
β”‚   β”‚   β”‚   β”œβ”€β”€ gmail_watch.py   # Gmail push subscriptions
β”‚   β”‚   β”‚   └── pubsub.py        # Pub/Sub webhooks
β”‚   β”‚   β”œβ”€β”€ services/            # Business logic
β”‚   β”‚   β”‚   β”œβ”€β”€ gmail_service.py
β”‚   β”‚   β”‚   β”œβ”€β”€ invoice_extractor.py
β”‚   β”‚   β”‚   β”œβ”€β”€ eleven_agent.py  # ElevenLabs AI calling
β”‚   β”‚   β”‚   β”œβ”€β”€ google_search_service.py
β”‚   β”‚   β”‚   β”œβ”€β”€ fraud_logger.py
β”‚   β”‚   β”‚   └── biller_extraction.py
β”‚   β”‚   β”œβ”€β”€ database/            # Database clients
β”‚   β”‚   β”‚   β”œβ”€β”€ supabase_client.py
β”‚   β”‚   β”‚   β”œβ”€β”€ companies.py
β”‚   β”‚   β”‚   └── gmail_watch.py
β”‚   β”‚   β”œβ”€β”€ auth/                # Authentication
β”‚   β”‚   β”‚   └── authentication.py
β”‚   β”‚   β”œβ”€β”€ models/              # Pydantic models
β”‚   β”‚   β”‚   └── schemas.py
β”‚   β”‚   └── config.py            # Configuration
β”‚   β”œβ”€β”€ ml/                      # Machine learning
β”‚   β”‚   β”œβ”€β”€ email_classifier.py  # Email type classification
β”‚   β”‚   └── domain_checker.py    # Domain legitimacy
β”‚   β”œβ”€β”€ main.py                  # FastAPI app entry point
β”‚   β”œβ”€β”€ requirements.txt         # Python dependencies
β”‚   └── test_*.py               # Test scripts
β”œβ”€β”€ webapp/                       # Next.js Frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”‚   β”œβ”€β”€ dashboard/       # Main dashboard
β”‚   β”‚   β”‚   β”œβ”€β”€ api/             # API routes
β”‚   β”‚   β”‚   └── utils/           # Utilities
β”‚   β”‚   β”œβ”€β”€ components/          # React components
β”‚   β”‚   β”‚   └── ui/              # UI primitives
β”‚   β”‚   └── lib/                 # Libraries
β”‚   β”œβ”€β”€ package.json
β”‚   └── next.config.ts
└── README.md                     # This file

πŸ” Key Components

Email Classifier (ml/email_classifier.py)

Uses scikit-learn to classify emails as:

  • Invoice/Bill
  • Receipt
  • Other

Domain Checker (ml/domain_checker.py)

Sophisticated domain analysis including:

  • Typosquatting detection
  • Homograph attack detection
  • Domain reputation checking
  • Company database matching

ElevenLabs Agent (app/services/eleven_agent.py)

Manages AI verification calls with:

  • Dynamic variable injection
  • User context from profiles
  • Invoice details from emails
  • Call recording and transcription

Google Search Service (app/services/google_search_service.py)

Searches for company information:

  • Phone numbers
  • Addresses
  • Email addresses
  • Website URLs

πŸ›£οΈ Roadmap

Near-term

  • Inbound verification (vendor calls back on verified number)
  • Call result parsing and analysis
  • Multi-language support for calls
  • Enhanced ML models for fraud detection
  • Webhook for call completion notifications

Medium-term

  • Risk-based routing (more checks for first-time vendors)
  • Admin UI for policy tuning
  • Call scheduling (business hours only)
  • Voice biometrics (privacy-vetted)
  • International phone number support

Long-term

  • Integration with payment systems
  • Automated payment approval/rejection
  • Vendor identity graph
  • Historical risk modeling
  • Mobile app

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project is proprietary software. All rights reserved.

πŸ“ž Support

For issues or questions:

  1. Check the documentation in /api/INTEGRATION_GUIDE.md
  2. Review test scripts in /api/test_*.py
  3. Open an issue on GitHub

πŸ™ Acknowledgments

  • ElevenLabs - For powerful conversational AI
  • Google Cloud - For Gmail API and Search API
  • Supabase - For database and authentication infrastructure
  • Twilio - For reliable telephony infrastructure

Built with ❀️ to protect against invoice fraud

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 81.2%
  • TypeScript 17.8%
  • Other 1.0%