Automated system for scraping and correlating contractor SEC registration data.
sec_scraper/
├── sec_search.ahk # AHK automation script (Windows)
├── sec_contractor_parser.py # Python parser for SEC results
├── generate_sec_json.py # Database → JSON generator
├── sec_results/ # Raw SEC search results (text files)
│ ├── JB_FELIPE_CONSTRUCTION.txt
│ ├── ST_TIMOTHY_CONSTRUCTION_CORPORATION.txt
│ └── ... (100+ files)
└── README.md # This file
# Double-click or run:
sec_search.ahkProcesses 100 contractors, saves results to sec_results/
cd /home/joebert/open-data-visualization
python3 sec_scraper/sec_contractor_parser.pyParses text files, updates PostgreSQL database.
python3 sec_scraper/generate_sec_json.pyExports database to static/sec_contractors_database.json for frontend.
Purpose: Automate SEC website scraping
Features:
- Processes 100 contractors per run
- Skips existing results
- Uses Escape + Shift+Tab for efficient navigation
- Minimal logic - just gets raw data
Requirements: Windows + AutoHotkey
Purpose: Parse SEC results and update database
Features:
- Handles encoding issues (ISO-8859-1, UTF-8)
- Extracts company details via regex
- Updates
contractorstable - JV-aware correlation with flood projects
- Strict fuzzy matching (≥90% for strict, <90% for fuzzy)
Requirements: Python 3, asyncpg, aiohttp, chardet
Purpose: Generate static JSON from database
Features:
- Single source of truth: PostgreSQL
- Maintains frontend compatibility
- Auto-generates summary statistics
Requirements: Python 3, asyncpg
CREATE TABLE contractors (
id SERIAL PRIMARY KEY,
contractor_name TEXT NOT NULL,
sec_number VARCHAR(255),
date_registered DATE,
status VARCHAR(50),
address TEXT,
secondary_licenses TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
project_count INTEGER DEFAULT 0,
UNIQUE(contractor_name, sec_number)
);CREATE TABLE project_contractors (
id SERIAL PRIMARY KEY,
project_id TEXT NOT NULL,
contractor_name TEXT NOT NULL,
contractor_role VARCHAR(50), -- 'main', 'jv_partner1', 'jv_partner2'
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(project_id, contractor_name, contractor_role)
);- 15 contractors with SEC data
- 100 contractors ready to process in AHK script
- 2,491 unique contractors identified from flood projects
- 10,627 project-contractor relationships (JV-aware)
See main SEC.md for complete documentation.