A single-page, browser-based exploratory data analysis experience built with vanilla HTML/CSS/JS plus CDN libraries (Bootstrap, Apache ECharts, Tabulator, Leaflet, Leaflet MarkerCluster). Datasets are pre-partitioned by district so the UI can load data incrementally and cache it in IndexedDB.
- Inline web app (
index.html) served from the repository root (ideal for GitHub Pages). - Node.js ETL script (
etl.js) that downloads the latest source CSVs, filters row values to valid districts, and writes per-district partitions underdata/report/along with index + BI settings. - IndexedDB caching, progress feedback, chart/table/map toggles, and download helpers for full or filtered rows.
- Map-specific controls (clustering, district/county filter, color by field) that activate automatically when a dataset supports map output.
- GitHub Actions workflow (
.github/workflows/nightly-etl.yml) that re-runs the ETL nightly and pushes refreshed data back to the repo.
analysis-eda-tool/
├── index.html # Entire UI (inline JS/CSS via CDN assets)
├── etl.js # Node ETL that regenerates data/report outputs
├── data/
│ ├── raw/ # Latest raw CSV downloads (overwritten each ETL run)
│ └── report/ # Partitioned CSVs + index/settings JSON consumed by the UI
├── .github/workflows/nightly-etl.yml # Nightly automation that runs ETL + commits results
├── ai-instructions.md # Working doc describing architecture and expectations
├── README.md # You are here
├── package.json / lock # ETL dependencies (csv-parse, csv-stringify, node-fetch, etc.)
├── node_modules/ # Local dependency install (ignored when deploying static site)
└── ... # Misc (css/, js/, settings.json, etc. not required for the app)
npm install(installs ETL dependencies only).node etl.js- Downloads fresh source CSVs into
data/raw/. - Filters/partitions rows written to
data/report/<dataset>/. - Generates
<dataset>.index.jsonand<dataset>-bi-settings.json, including map metadata where applicable.
- Downloads fresh source CSVs into
- Open
index.htmldirectly in the browser (or serve via a simple static server) to work offline.
- The dataset dropdown seeds from
DATASETSdefined inindex.html. Add or remove entries there, matching folder names underdata/report/. - Cached partitions are stored by key
dataset|districtin IndexedDB. Use the “Refresh Cache” button to clear the current dataset when the underlying files change. - Chart legend auto-hides when a split produces more than 13 series to keep the layout readable.
- Map view is only enabled when the dataset’s BI settings include map configuration. Selecting “Map” hides the general chart controls and shows the map options panel (clustering, district, county, color).
- Map clustering defaults to on. Choosing a specific district or county temporarily disables clustering; returning to “All Districts” re-enables it.
- Table view uses Tabulator with horizontal scroll and maintains consistent column widths. Downloads respect filters by reading the active rows.
- Commit
index.htmland the generateddata/report/directory alongside this README. - For GitHub Pages, set the Pages source to the repository root on the desired branch. No build step is required because the app is fully static.
- The nightly workflow pushes updated
data/content each day shortly after 12:10 AM US Eastern (05:15 UTC). Ensure Actions have permission to push to your branch.
- Workflow file:
.github/workflows/nightly-etl.yml. - Actions job checks out the repo, installs Node, runs
node etl.js, and commits any changes insidedata/. - You can trigger it manually via the “Run workflow” button in GitHub UI if you need a mid-day refresh.
- Update
ai-instructions.mdwith any technical decisions that future contributors or AI assistants should know. - Keep ETL output under version control so GitHub Pages stays in sync.
- Open issues or PRs for new dataset integrations, UI improvements, or automation tweaks.
For detailed build guidance and design decisions, see ai-instructions.md.