Skip to content

tulsawebdevs/data-jam-feb-crdc

Repository files navigation

CRDC 2021-22 Data Analysis Bootstrap

This repository now includes a dual-stack data analysis setup:

  • Python workspace for pandas-based analysis
  • TypeScript workspace for Node-based data tooling
  • Docker + Postgres for local database-backed experiments

The source data files in data/raw (for example data/raw/SCH/*.csv and data/raw/LEA/*.csv) are used by the smoke-test and seeding scripts.

Repository Layout

.
├── data/
│   ├── raw/
│   │   ├── LEA/
│   │   └── SCH/
├── python/
│   ├── pyproject.toml
│   ├── pipeline.py
│   ├── analysis/
│   │   ├── etl/
│   │   ├── metrics/
│   │   ├── stats/
│   │   ├── models/
│   │   ├── viz/
│   │   ├── reporting/
│   │   └── dashboard/
│   ├── notebooks/
│   ├── seed_pg.py
│   ├── reset_pg_seed_state.py
│   └── smoke_test.py
├── typescript/
│   ├── package.json
│   ├── tsconfig.json
│   └── src/
│       └── smoke-test.ts
├── docker/
│   └── postgres/
│       └── init/
│           └── 001_create_schema.sql
├── Dockerfile
├── docker-compose.yml
└── .env.example

1) Python Smoke Test

# install uv once if needed: https://docs.astral.sh/uv/getting-started/installation/
uv sync --project python
uv run --project python python python/smoke_test.py

2) TypeScript Smoke Test

npm --prefix typescript install
npm --prefix typescript run smoke

3) Start Postgres with Docker

cp .env.example .env
docker compose up -d pg

Useful checks:

docker compose ps
docker compose logs -f pg

4) Seed Postgres from data/raw

Preview table mappings (no DB writes):

uv run --project python python python/seed_pg.py --dry-run --limit 5

Load all CSVs into schema raw:

uv run --project python python python/seed_pg.py

Useful options:

# only load files that match a substring
uv run --project python python python/seed_pg.py --only "SCH/Advanced" --only "LEA"

# load into a custom schema
uv run --project python python python/seed_pg.py --schema crdc_raw

# append instead of replacing each table
uv run --project python python python/seed_pg.py --append

Reset seed state (drops raw + metadata schemas and recreates metadata table):

uv run --project python python python/reset_pg_seed_state.py

5) Run the Deep Analysis Pipeline

Run the full pipeline (curation, metrics, richer exploration aggregates, inferential stats, predictive models, static + interactive viz, exports, report):

uv run --project python python python/pipeline.py

Run selected stages only:

uv run --project python python python/pipeline.py --stages curate metrics
uv run --project python python python/pipeline.py --stages explore stats models report --skip-interactive

Generated artifacts:

  • Curated analytical tables in Postgres schema analysis.*
  • CSV exports in outputs/tables/
  • Static charts in outputs/figures/static/
  • Interactive charts in outputs/figures/interactive/
  • Narrative report at docs/analysis.md

Additional exploration tables generated by the explore stage:

  • analysis.explore_state_profiles (state readiness profiles)
  • analysis.explore_quadrants_state / analysis.explore_quadrants_school (opportunity vs discipline quadrants)
  • analysis.explore_archetypes_summary / analysis.explore_archetypes_school (clustered school archetypes)
  • analysis.explore_lea_peer_benchmarks (LEA peer deltas by size-band and state)

6) Launch the Interactive Dashboard

After running the pipeline, start Streamlit:

uv run --project python streamlit run python/analysis/dashboard/app.py

Dashboard includes:

  • Core relationships (support, discipline, digital, STEM)
  • School archetype distribution and profile summaries
  • Peer benchmarking views (top/bottom LEAs vs similar-size peers)

7) Open the Analysis Notebook

Notebook path:

  • python/notebooks/deep_analysis_notebook.ipynb

It consumes exported tables from outputs/tables/ for exploratory drill-down.

8) Run Python Tests

uv run --project python pytest python/tests -q

Connection defaults:

  • Host: localhost
  • Port: 5432
  • Database: crdc
  • User: crdc
  • Password: crdc

Override values in .env as needed.

Optional: Launch a Dev Container with both runtimes

The compose file also defines an analysis service (profile dev) built from Dockerfile.

docker compose --profile dev up -d analysis

This mounts the full repository at /workspace and keeps the container alive for interactive work.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published