Skip to content

michaelshapkin/ghostmail-collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GhostMail Collector 👻📧

Auto Update

A curated, auto-updated open-source list of disposable email domains used in spam, bots, and temporary email services.


🚀 What is this?

GhostMail Collector is a powerful, automated tool that compiles and maintains the most comprehensive list of disposable email domains. It fetches, validates, and deduplicates domains from trusted open-source repositories, ensuring high accuracy for:

  • Email validation
  • Anti-spam filters
  • User registration checks
  • Integration into SaaS, backends, or APIs

The list is updated daily via GitHub Actions and stored in two formats:

  • 📄 data/raw_domains.txt: All collected domains (~180K).
  • 📄 data/disposable_emails.txt: Domains with valid MX records (~32K).

📋 Output Files

  • data/raw_domains.txt: Complete, deduplicated list of disposable email domains from all sources (~180,000 domains).
  • data/disposable_emails.txt: Filtered list of domains with valid MX records, ideal for strict email validation (~32,000 domains).
  • data/collector_log_*.txt: Logs detailing fetch results, MX checks, and excluded domains.

Both .txt files are plain text, one domain per line, ready for integration.


⚙️ How it works

  1. A Python script fetches domains from multiple open-source repositories.
  2. Domains are cleaned, deduplicated, and validated for format.
  3. MX records are checked to filter domains with active email capabilities.
  4. GitHub Actions runs the script daily at 04:00 UTC.
  5. Results are committed to data/raw_domains.txt and data/disposable_emails.txt.

📡 Sources

The collector aggregates domains from the following trusted sources:


🛠️ How to run locally

  1. Clone the repository:

    git clone https://github.com/michaelshapkin/ghostmail-collector.git
    cd ghostmail-collector
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the collector:

    python src/collector.py
  4. Check outputs in data/:

    • raw_domains.txt
    • disposable_emails.txt
    • collector_log_*.txt

📊 Stats

  • Total domains: ~180,000 (deduplicated across all sources)
  • MX-validated domains: ~32,000 (domains confirmed with active MX records via strict check)
  • Update frequency: Daily at 04:00 UTC (via GitHub Actions)
  • Processing time: ~1 hour and 23 minutes (fetching + MX checks using 10 workers)

About

GhostMail Collector: Open-source tool for collecting disposable email domains

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages