🛡️ PEFA — Phishing Email Forensic Analyzer

A Python CLI tool that converts .eml files into cyber-infographic PNGs and interactive HTML reports. PEFA performs automated forensic analysis of phishing indicators and produces a composite threat score (0–100) backed by multiple detection engines and optional threat intelligence APIs.

Sample PEFA report — WELLS FARGO BANK phishing analysis

☝️ Click to see full report · 🔗 Interactive HTML version · All sample reports

📂 More sample reports (17 total)

Report	PNG	HTML
ATTENTION DEAR	🖼️ PNG	🌐 HTML
Congratulations Dear	🖼️ PNG	🌐 HTML
Dear Friend	🖼️ PNG	🌐 HTML
Dear Winner	🖼️ PNG	🌐 HTML
File	🖼️ PNG	🌐 HTML
Greetings to you	🖼️ PNG	🌐 HTML
HAPPY NEW YEAR!	🖼️ PNG	🌐 HTML
Konto-Überprüefig (Swiss German)	🖼️ PNG	🌐 HTML
Online Bank Of Africa	🖼️ PNG	🌐 HTML
Please I Need Your Urgent Attention	🖼️ PNG	🌐 HTML
INSTRUCTION TO CREDIT YOUR ACCOUNT ($25M)	🖼️ PNG	🌐 HTML
THIS IS YOUR ATM VISA CARD	🖼️ PNG	🌐 HTML
Text or Call +1 225 463 0148	🖼️ PNG	🌐 HTML
URGENT RESPONSE	🖼️ PNG	🌐 HTML
Votre colis est prêt pour la livraison	🖼️ PNG	🌐 HTML
Your Funds Update!	🖼️ PNG	🌐 HTML
original_msg	🖼️ PNG	🌐 HTML

✨ Features

🎯 Threat Scoring — Weighted 0–100 composite score across 7 categories with 5 severity levels (Clean / Low / Medium / High / Critical)
🔗 Link Analysis — HREF mismatches, brand lookalikes, homoglyph domains, IP-based URLs, URL shorteners, suspicious TLDs, JavaScript/data URIs
👤 Sender Spoofing Detection — Display name spoofing, Return-Path/Reply-To mismatches, domain impersonation, homoglyph characters
⚡ Urgency Language Scanning — 24 social-engineering pressure patterns, generic greeting detection, keyword density scoring
📎 Attachment Threat Assessment — 40+ dangerous extensions, macro-enabled documents, double extensions, MIME mismatches, file hashing (MD5/SHA256)
🔐 Authentication Checks — SPF, DKIM, and DMARC validation from headers (with optional MXToolbox deep validation)
🛤️ Delivery Path Tracing — Full email hop trace with IP geolocation per relay
📅 Domain Age Lookup — WHOIS-based registration date and age risk assessment
🔤 Language Quality Analysis — Mixed-script detection, entropy analysis, zero-width characters, irregular spacing
🧬 IOC Extraction — Consolidated Indicators of Compromise (IPs, domains, URLs, emails, hashes) with optional enrichment
🤖 AI Assessment — Optional Google Gemini analysis with verdict, confidence score, attack classification, and recommended actions
📊 Interactive HTML Reports — Collapsible sections, scroll-spy navigation, copy-to-clipboard, animated threat gauge, tooltips
📁 Batch Processing — Analyze entire directories of .eml files with a single command
🌐 Web UI — Browser-based upload interface with live analysis (no Playwright needed client-side)

📦 Installation

pip install pefa
playwright install chromium

Or install from source:

pip install .
playwright install chromium

Requires Python 3.10+ · PyPI page

🚀 Quick Start

# Analyze a single email → PNG infographic
pefa input.eml

# Also generate an interactive HTML report
pefa input.eml --html

# Include Gemini AI assessment
pefa input.eml --gemini

# Skip all external API calls (fully offline)
pefa input.eml --no-api

# Batch process a directory
pefa ./emails/ -o ./reports/

# Launch the web UI
pefa --web --port 8080

Or run as a module:

python3 -m pefa input.eml

📖 Usage Examples

Analyze a single email

pefa suspicious-email.eml

This produces suspicious-email.png in the current directory — a full-page infographic with threat score, sender analysis, link flags, authentication results, and the rendered email body.

Generate both PNG and interactive HTML

pefa suspicious-email.eml --html

Outputs two files: suspicious-email.png and suspicious-email.html. The HTML report includes collapsible sections, scroll-spy navigation, an animated threat gauge, copy-to-clipboard for IOCs, and print/download buttons.

Save output to a specific location

pefa suspicious-email.eml -o ./reports/case-42.png
pefa suspicious-email.eml -o ./reports/case-42.png --html

The -o flag sets the output path. When combined with --html, the HTML file is placed alongside the PNG.

Batch process a folder of emails

pefa ./inbox/ -o ./reports/

Analyzes every .eml file in ./inbox/ and writes reports to ./reports/. A single Playwright browser instance is reused across all files for faster processing. If -o is omitted, reports go to ./inbox/reports/.

Include AI-powered assessment

export GEMINI_API_KEY="your-key-here"
pefa suspicious-email.eml --gemini

Adds a Gemini AI section to the report with a verdict (phishing/legitimate/suspicious), confidence score, attack type classification, and recommended actions. The AI assessment can also influence the overall threat score (+25 or +50 points).

To use a different Gemini model:

pefa suspicious-email.eml --gemini --gemini-model gemini-2.5-pro

Run fully offline (no API calls)

pefa suspicious-email.eml --no-api

Skips all external lookups (IP geolocation, WHOIS, urlscan, VirusTotal, AbuseIPDB, AlienVault, MXToolbox). The analysis still runs SPF/DKIM/DMARC checks from headers, link analysis, urgency detection, and attachment scanning — all locally.

Customize image dimensions

pefa suspicious-email.eml --width 1400 --scale 2

--width sets the viewport width in pixels (default: 1000). --scale sets the device scale factor (default: 1.5) — higher values produce sharper images at larger file sizes.

Launch the web UI

pefa --web

Opens a browser-based drag-and-drop interface at http://localhost:8080. Upload .eml files and view interactive HTML reports directly — no Playwright needed on the client side.

pefa --web --port 9090
pefa --web --no-api
pefa --web --gemini

The web UI respects --no-api and --gemini flags.

Combine multiple flags

# Full analysis with AI, HTML output, and high-res image
pefa suspicious-email.eml --html --gemini --width 1200 --scale 2

# Batch process offline with HTML reports
pefa ./inbox/ -o ./reports/ --html --no-api

# Run the sample emails included in the repo
pefa samples/ -o examples/ --html

Use with threat intel API keys

Set any combination of API keys to enrich reports with external intelligence:

export GEMINI_API_KEY="..."       # AI assessment
export URLSCAN_API_KEY="..."      # Domain reputation
export VT_API_KEY="..."           # VirusTotal IOC reputation
export ABUSEIPDB_API_KEY="..."    # IP abuse reports
export OTX_API_KEY="..."          # AlienVault OTX threat intel
export MXTOOLBOX_API_KEY="..."    # Deep email auth validation

pefa suspicious-email.eml --html --gemini

Each integration activates independently — you don't need all keys. Missing keys are silently skipped.

⚙️ CLI Reference

usage: pefa [-h] [--web] [--port PORT] [-o OUTPUT] [--width WIDTH]
            [--scale SCALE] [--html] [--gemini]
            [--gemini-model MODEL] [--no-api]
            [input]

positional arguments:
  input                 .eml file or directory of .eml files

options:
  -o, --output          Output path for generated reports
  --web                 Start browser-based web UI
  --port                Web server port (default: 8080)
  --width               Viewport width in pixels (default: 1000)
  --scale               Device scale factor (default: 1.5)
  --html                Emit interactive HTML report alongside PNG
  --gemini              Include Gemini AI assessment
  --gemini-model        Gemini model to use (default: gemini-2.5-flash)
  --no-api              Skip all external API lookups

🎯 Threat Scoring

PEFA calculates a composite threat score from 0 to 100 using weighted categories:

Category	Max Points	What It Measures
🔐 Authentication	20	SPF, DKIM, DMARC failures
👤 Sender	20	Spoofing, homoglyphs, header mismatches
🔗 Links	25	HREF mismatches, brand lookalikes, IP URLs, shorteners
⚡ Urgency	15	Pressure language patterns, generic greetings
📎 Attachments	10	Dangerous extensions, macros, double extensions
🔤 Language	5	Mixed scripts, entropy anomalies, quality issues
📅 Domain Age	10	Newly registered or young domains

Passing all authentication checks and having an established domain (3+ years) applies negative scoring. Gemini AI verdicts can add up to +50 additional points.

Threat Levels:

Level	Score
🔴 Critical	70–100
🟠 High	45–69
🟡 Medium	25–44
🟢 Low	10–24
⚪ Clean	0–9

🔌 API Integrations

All API integrations are optional. PEFA works fully offline with --no-api. Each integration checks for its own environment variable and silently skips if unavailable. No API key is required to run a basic analysis — PEFA performs link analysis, urgency detection, sender spoofing checks, attachment scanning, authentication header parsing, and threat scoring entirely locally.

Overview

Service	Environment Variable	Free?	What It Adds to Reports
🤖 Google Gemini	`GEMINI_API_KEY`	Free tier available	AI verdict, attack classification, recommended actions
🔍 urlscan.io	`URLSCAN_API_KEY`	Free tier available	URL/domain reputation verdicts
📧 MXToolbox	`MXTOOLBOX_API_KEY`	Paid	Deep SPF/DKIM/DMARC validation against live DNS
🦠 VirusTotal	`VT_API_KEY`	Free tier available	IOC reputation (IPs, domains, URLs, file hashes)
🚨 AbuseIPDB	`ABUSEIPDB_API_KEY`	Free tier available	IP abuse confidence scores and report counts
👽 AlienVault OTX	`OTX_API_KEY`	Free	Threat intelligence pulse counts and reputation
🌍 ip-api.com	(none)	Free	IP geolocation for delivery path hops
📋 WHOIS	(none)	Free	Domain registration age and registrar info

Getting API Keys

Google Gemini

Sign up at Google AI Studio to get a free API key. The free tier provides generous rate limits suitable for individual use.

export GEMINI_API_KEY="your-key-here"

Gemini provides an AI-powered phishing assessment that includes a verdict (phishing / suspicious / legitimate), confidence percentage, executive summary, technical analysis, attack type classification, and recommended actions. It can also boost the threat score by up to +50 points.

# Use default model (gemini-2.5-flash)
pefa email.eml --gemini

# Use a more capable model
pefa email.eml --gemini --gemini-model gemini-2.5-pro

Note: The --gemini flag is required to activate AI analysis even if GEMINI_API_KEY is set. This keeps AI calls explicit.

urlscan.io

Sign up at urlscan.io for a free account. Navigate to your profile to find your API key.

export URLSCAN_API_KEY="your-key-here"

When suspicious links are detected, PEFA queries urlscan.io for domain reputation data including overall verdict (malicious/suspicious/benign), page metadata, and redirect statistics. Results link directly to the urlscan.io result page for manual investigation.

MXToolbox

Sign up at MXToolbox for an API subscription.

export MXTOOLBOX_API_KEY="your-key-here"

Performs live DNS-based validation of SPF, DKIM, and DMARC records for the sender's domain. This goes beyond parsing email headers — it checks the actual DNS configuration. If MXToolbox results contradict the header claims (e.g., headers say DKIM pass but DNS shows a failure), PEFA flags the discrepancy as a warning.

VirusTotal

Sign up at VirusTotal for a free community account. Your API key is available on your profile page.

export VT_API_KEY="your-key-here"

Enriches extracted IOCs with multi-vendor detection results:

IPs (up to 5) — malicious/suspicious/harmless detection counts and reputation score
Domains (up to 5) — same detection breakdown plus reputation
URLs (up to 3) — vendor detection counts
File hashes (up to 5) — detection counts and meaningful filenames

Free tier: 4 requests/minute, 500 requests/day, 15.5K requests/month.

AbuseIPDB

Sign up at AbuseIPDB for a free account.

export ABUSEIPDB_API_KEY="your-key-here"

Checks IP addresses (up to 5) against AbuseIPDB's crowd-sourced abuse report database. Returns an abuse confidence score (0–100), total number of reports, whitelist status, country, and ISP. Queries cover reports from the last 90 days.

Free tier: 1,000 checks/day.

AlienVault OTX

Sign up at AlienVault OTX for a free account.

export OTX_API_KEY="your-key-here"

Queries the Open Threat Exchange for community-sourced threat intelligence. Returns pulse counts (how many threat intelligence reports reference the IOC) and reputation scores for:

IPs (up to 5)
Domains (up to 5)
URLs (up to 3)
File hashes (up to 5)

Setting Up All API Keys

For maximum enrichment, configure all keys in your shell profile (~/.bashrc, ~/.zshrc, etc.):

# Required: set --gemini flag to activate
export GEMINI_API_KEY="your-gemini-key"

# Threat intelligence (activate automatically when set)
export URLSCAN_API_KEY="your-urlscan-key"
export VT_API_KEY="your-virustotal-key"
export ABUSEIPDB_API_KEY="your-abuseipdb-key"
export OTX_API_KEY="your-alienvault-key"

# Email authentication
export MXTOOLBOX_API_KEY="your-mxtoolbox-key"

Then run with full enrichment:

pefa email.eml --html --gemini

API Usage Examples

# Fully offline — no API calls at all
pefa email.eml --no-api

# Basic analysis with free APIs only (ip-api.com + WHOIS)
# No env vars needed
pefa email.eml

# Add AI assessment only
export GEMINI_API_KEY="..."
pefa email.eml --gemini

# IOC enrichment with VirusTotal + AbuseIPDB
export VT_API_KEY="..."
export ABUSEIPDB_API_KEY="..."
pefa email.eml --html

# Full enrichment: all APIs + AI + HTML report
export GEMINI_API_KEY="..."
export VT_API_KEY="..."
export ABUSEIPDB_API_KEY="..."
export OTX_API_KEY="..."
export URLSCAN_API_KEY="..."
export MXTOOLBOX_API_KEY="..."
pefa email.eml --html --gemini

# Batch process with full enrichment
pefa ./emails/ -o ./reports/ --html --gemini

How APIs Affect the Report

Without any API keys, PEFA still performs:

Header-based SPF/DKIM/DMARC checks
Link analysis (mismatches, brand impersonation, homoglyphs, suspicious TLDs)
Sender spoofing detection
Urgency language scanning
Attachment threat assessment
Language quality analysis
Threat scoring (0–100)

Adding API keys progressively enriches the report:

APIs Configured	Additional Report Sections
(none)	Base analysis with all local checks
+ `GEMINI_API_KEY`	AI Assessment panel with verdict, confidence, attack classification
+ `VT_API_KEY`	IOC table with VirusTotal detection counts per indicator
+ `ABUSEIPDB_API_KEY`	IP abuse confidence scores in IOC table
+ `OTX_API_KEY`	Threat intelligence pulse counts in IOC table
+ `URLSCAN_API_KEY`	URL reputation verdicts in link analysis section
+ `MXTOOLBOX_API_KEY`	Deep DNS validation results in authentication section

🏗️ Architecture

.eml file → parser.py → pipeline.run_analysis() → PageRenderer.build() → Playwright → .png/.html

pefa/
├── cli.py                  # CLI argument parsing and entry point
├── parser.py               # .eml parsing and header extraction
├── pipeline.py             # Analysis orchestrator
├── scoring.py              # Weighted threat score calculation
├── highlighting.py         # Email body highlighting (urgency keywords, suspicious links)
├── constants.py            # TLDs, shorteners, extensions, regex patterns, homoglyphs
├── deps.py                 # Centralized optional dependency imports
├── analyzers/
│   ├── links.py            # LinkAnalyzer — URL and domain analysis
│   ├── sender.py           # SenderAnalyzer — spoofing and impersonation
│   ├── urgency.py          # UrgencyAnalyzer — pressure language patterns
│   ├── attachments.py      # AttachmentAnalyzer — file threat assessment
│   ├── language.py         # LanguageAnalyzer — text quality and encoding
│   └── ioc_consolidator.py # IOC extraction and enrichment
├── api/
│   ├── ip_lookup.py        # IP geolocation (ip-api.com)
│   ├── gemini.py           # Google Gemini AI assessment
│   ├── urlscan.py          # urlscan.io domain reputation
│   ├── mxtoolbox.py        # SPF/DKIM/DMARC validation
│   ├── whois_client.py     # Domain WHOIS lookup
│   ├── virustotal.py       # VirusTotal IOC lookup
│   ├── abuseipdb.py        # AbuseIPDB IP reputation
│   └── alienvault.py       # AlienVault OTX intelligence
├── renderers/
│   ├── page.py             # Full HTML page assembly
│   └── widgets/            # 13 analysis section widgets
└── templates/
    ├── css/                # Dark theme, interactive styling
    └── js/                 # Section navigation, animations, interactivity

📤 Output

🖼️ PNG mode (default) produces a single infographic image containing all analysis sections: threat gauge, sender analysis, authentication status, link flags, urgency patterns, attachments, domain age, delivery path, IP geolocation, and the rendered email body in a sandboxed frame.

📊 HTML mode (--html) additionally produces an interactive report with collapsible sections, scroll-spy navigation, animated gauges, copy-to-clipboard for IOCs, and download/print buttons.

🌐 Web UI (--web) serves a browser-based interface for uploading .eml files and viewing analysis results interactively without needing Playwright installed on the client.

🧪 Sample Emails

The samples/ directory contains example phishing emails (419 scams, social engineering, impersonation) for testing. Pre-generated reports are available in examples/.

pefa samples/

📄 License

See pyproject.toml for package metadata.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
examples		examples
pefa		pefa
samples		samples
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

CHA0S-CORP/PEFA

Folders and files

Latest commit

History

Repository files navigation