This repository now includes a dual-stack data analysis setup:
- Python workspace for pandas-based analysis
- TypeScript workspace for Node-based data tooling
- Docker + Postgres for local database-backed experiments
The source data files in data/raw (for example data/raw/SCH/*.csv and data/raw/LEA/*.csv) are used by the smoke-test and seeding scripts.
.
├── data/
│ ├── raw/
│ │ ├── LEA/
│ │ └── SCH/
├── python/
│ ├── pyproject.toml
│ ├── pipeline.py
│ ├── analysis/
│ │ ├── etl/
│ │ ├── metrics/
│ │ ├── stats/
│ │ ├── models/
│ │ ├── viz/
│ │ ├── reporting/
│ │ └── dashboard/
│ ├── notebooks/
│ ├── seed_pg.py
│ ├── reset_pg_seed_state.py
│ └── smoke_test.py
├── typescript/
│ ├── package.json
│ ├── tsconfig.json
│ └── src/
│ └── smoke-test.ts
├── docker/
│ └── postgres/
│ └── init/
│ └── 001_create_schema.sql
├── Dockerfile
├── docker-compose.yml
└── .env.example
# install uv once if needed: https://docs.astral.sh/uv/getting-started/installation/
uv sync --project python
uv run --project python python python/smoke_test.pynpm --prefix typescript install
npm --prefix typescript run smokecp .env.example .env
docker compose up -d pgUseful checks:
docker compose ps
docker compose logs -f pgPreview table mappings (no DB writes):
uv run --project python python python/seed_pg.py --dry-run --limit 5Load all CSVs into schema raw:
uv run --project python python python/seed_pg.pyUseful options:
# only load files that match a substring
uv run --project python python python/seed_pg.py --only "SCH/Advanced" --only "LEA"
# load into a custom schema
uv run --project python python python/seed_pg.py --schema crdc_raw
# append instead of replacing each table
uv run --project python python python/seed_pg.py --appendReset seed state (drops raw + metadata schemas and recreates metadata table):
uv run --project python python python/reset_pg_seed_state.pyRun the full pipeline (curation, metrics, richer exploration aggregates, inferential stats, predictive models, static + interactive viz, exports, report):
uv run --project python python python/pipeline.pyRun selected stages only:
uv run --project python python python/pipeline.py --stages curate metrics
uv run --project python python python/pipeline.py --stages explore stats models report --skip-interactiveGenerated artifacts:
- Curated analytical tables in Postgres schema
analysis.* - CSV exports in
outputs/tables/ - Static charts in
outputs/figures/static/ - Interactive charts in
outputs/figures/interactive/ - Narrative report at
docs/analysis.md
Additional exploration tables generated by the explore stage:
analysis.explore_state_profiles(state readiness profiles)analysis.explore_quadrants_state/analysis.explore_quadrants_school(opportunity vs discipline quadrants)analysis.explore_archetypes_summary/analysis.explore_archetypes_school(clustered school archetypes)analysis.explore_lea_peer_benchmarks(LEA peer deltas by size-band and state)
After running the pipeline, start Streamlit:
uv run --project python streamlit run python/analysis/dashboard/app.pyDashboard includes:
- Core relationships (support, discipline, digital, STEM)
- School archetype distribution and profile summaries
- Peer benchmarking views (top/bottom LEAs vs similar-size peers)
Notebook path:
python/notebooks/deep_analysis_notebook.ipynb
It consumes exported tables from outputs/tables/ for exploratory drill-down.
uv run --project python pytest python/tests -qConnection defaults:
- Host:
localhost - Port:
5432 - Database:
crdc - User:
crdc - Password:
crdc
Override values in .env as needed.
The compose file also defines an analysis service (profile dev) built from Dockerfile.
docker compose --profile dev up -d analysisThis mounts the full repository at /workspace and keeps the container alive for interactive work.