ICANN CZDS (Centralized Zone Data Service) zone file collector and newly registered domain detection service.
- Parallel zone file download from ICANN CZDS (configurable concurrency)
- Gzip compressed zone file parsing (streaming, memory-efficient)
- MongoDB domain upsert with bulk write operations
- Newly registered domain detection (
first_seentracking) - Sync statistics (
zone_sync_statscollection) - Sync gap detection for false positive prevention
- Scheduled automatic sync (APScheduler)
- Memory usage monitoring
- Python 3.11+
- MongoDB 5.0+
- ICANN CZDS account
pip install -r requirements.txtCreate a .env file:
MONGODB_URL=mongodb://user:pass@localhost:27017/
DATABASE_NAME=icann_tlds_db
ICANN_USERNAME=your_email@example.com
ICANN_PASSWORD=your_password
SCHEDULE_HOURS=0,12
ZONE_FILES_DIR=./zonefiles
MAX_CONCURRENT_DOWNLOADS=10
UPSERT_BATCH_SIZE=5000uvicorn app.main:app --reload --port 8002docker build -t zone-collector .
docker run -p 8002:8000 --env-file .env zone-collector| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/sync |
POST | Start manual sync |
/sync/status |
GET | Current sync status |
| Endpoint | Method | Description |
|---|---|---|
/tlds |
GET | List available TLDs |
/tlds/{tld}/stats |
GET | TLD statistics |
/tlds/{tld}/domains |
GET | TLD domains (paginated) |
/zone-links |
GET | Available zone file links |
| Endpoint | Method | Description |
|---|---|---|
/newly-registered |
GET | Newly registered domains |
/newly-registered/stats |
GET | Sync statistics |
| Parameter | Type | Default | Description |
|---|---|---|---|
days_back |
int | 7 | Days to look back (1-365) |
tld |
str | null | TLD filter |
page |
int | 1 | Page number |
page_size |
int | 100 | Records per page |
TLDs are processed in parallel using asyncio.Semaphore:
max_concurrent_downloads = 10 # ConfigurableMongoDB writes use bulk operations:
upsert_batch_size = 5000 # Configurable
ordered = False # Continues on errorLarge zone files (1M+ domains) are processed in 50K chunks to prevent OOM:
for tld, domains, is_last in parse_zone_file_chunked(file):
# Process 50K domains at a time
await mongodb.upsert_domains(tld, domains)
del domains # Free memoryThis allows processing files like vip.txt.gz (1.5M domains) without OOM.
{
"domain": "example",
"fqdn": "example.com",
"first_seen": ISODate(),
"last_seen": ISODate(),
"dns_records": { "ns": [...], "a": [...] },
"metadata": { "source": "icann_czds" }
}{
"tld": "com",
"inserted": 1500,
"updated": 500,
"sync_time": ISODate()
}{
"tld": "com",
"last_sync": ISODate(),
"domain_count": 150000000,
"sync_count": 42
}zone-collector/
├── app/
│ ├── main.py # FastAPI app, memory monitoring
│ ├── config.py # Settings (Pydantic)
│ ├── scheduler.py # APScheduler
│ ├── api/routes.py # API endpoints
│ ├── database/mongodb.py # MongoDB operations
│ └── services/
│ ├── czds_client.py # ICANN API client
│ ├── zone_parser.py # Streaming parser
│ └── sync_service.py # Parallel sync orchestration
└── requirements.txt
