CRDC 2021-22 Data Analysis Bootstrap

This repository now includes a dual-stack data analysis setup:

Python workspace for pandas-based analysis
TypeScript workspace for Node-based data tooling
Docker + Postgres for local database-backed experiments

The source data files in data/raw (for example data/raw/SCH/*.csv and data/raw/LEA/*.csv) are used by the smoke-test and seeding scripts.

Repository Layout

.
├── data/
│   ├── raw/
│   │   ├── LEA/
│   │   └── SCH/
├── python/
│   ├── pyproject.toml
│   ├── pipeline.py
│   ├── analysis/
│   │   ├── etl/
│   │   ├── metrics/
│   │   ├── stats/
│   │   ├── models/
│   │   ├── viz/
│   │   ├── reporting/
│   │   └── dashboard/
│   ├── notebooks/
│   ├── seed_pg.py
│   ├── reset_pg_seed_state.py
│   └── smoke_test.py
├── typescript/
│   ├── package.json
│   ├── tsconfig.json
│   └── src/
│       └── smoke-test.ts
├── docker/
│   └── postgres/
│       └── init/
│           └── 001_create_schema.sql
├── Dockerfile
├── docker-compose.yml
└── .env.example

1) Python Smoke Test

# install uv once if needed: https://docs.astral.sh/uv/getting-started/installation/
uv sync --project python
uv run --project python python python/smoke_test.py

2) TypeScript Smoke Test

npm --prefix typescript install
npm --prefix typescript run smoke

3) Start Postgres with Docker

cp .env.example .env
docker compose up -d pg

Useful checks:

docker compose ps
docker compose logs -f pg

4) Seed Postgres from `data/raw`

Preview table mappings (no DB writes):

uv run --project python python python/seed_pg.py --dry-run --limit 5

Load all CSVs into schema raw:

uv run --project python python python/seed_pg.py

Useful options:

# only load files that match a substring
uv run --project python python python/seed_pg.py --only "SCH/Advanced" --only "LEA"

# load into a custom schema
uv run --project python python python/seed_pg.py --schema crdc_raw

# append instead of replacing each table
uv run --project python python python/seed_pg.py --append

Reset seed state (drops raw + metadata schemas and recreates metadata table):

uv run --project python python python/reset_pg_seed_state.py

5) Run the Deep Analysis Pipeline

Run the full pipeline (curation, metrics, richer exploration aggregates, inferential stats, predictive models, static + interactive viz, exports, report):

uv run --project python python python/pipeline.py

Run selected stages only:

uv run --project python python python/pipeline.py --stages curate metrics
uv run --project python python python/pipeline.py --stages explore stats models report --skip-interactive

Generated artifacts:

Curated analytical tables in Postgres schema analysis.*
CSV exports in outputs/tables/
Static charts in outputs/figures/static/
Interactive charts in outputs/figures/interactive/
Narrative report at docs/analysis.md

Additional exploration tables generated by the explore stage:

analysis.explore_state_profiles (state readiness profiles)
analysis.explore_quadrants_state / analysis.explore_quadrants_school (opportunity vs discipline quadrants)
analysis.explore_archetypes_summary / analysis.explore_archetypes_school (clustered school archetypes)
analysis.explore_lea_peer_benchmarks (LEA peer deltas by size-band and state)

6) Launch the Interactive Dashboard

After running the pipeline, start Streamlit:

uv run --project python streamlit run python/analysis/dashboard/app.py

Dashboard includes:

Core relationships (support, discipline, digital, STEM)
School archetype distribution and profile summaries
Peer benchmarking views (top/bottom LEAs vs similar-size peers)

7) Open the Analysis Notebook

Notebook path:

python/notebooks/deep_analysis_notebook.ipynb

It consumes exported tables from outputs/tables/ for exploratory drill-down.

8) Run Python Tests

uv run --project python pytest python/tests -q

Connection defaults:

Host: localhost
Port: 5432
Database: crdc
User: crdc
Password: crdc

Override values in .env as needed.

Optional: Launch a Dev Container with both runtimes

The compose file also defines an analysis service (profile dev) built from Dockerfile.

docker compose --profile dev up -d analysis

This mounts the full repository at /workspace and keeps the container alive for interactive work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRDC 2021-22 Data Analysis Bootstrap

Repository Layout

1) Python Smoke Test

2) TypeScript Smoke Test

3) Start Postgres with Docker

4) Seed Postgres from `data/raw`

5) Run the Deep Analysis Pipeline

6) Launch the Interactive Dashboard

7) Open the Analysis Notebook

8) Run Python Tests

Optional: Launch a Dev Container with both runtimes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/raw		data/raw
docker/postgres/init		docker/postgres/init
docs		docs
python		python
typescript		typescript
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

tulsawebdevs/data-jam-feb-crdc

Folders and files

Latest commit

History

Repository files navigation

CRDC 2021-22 Data Analysis Bootstrap

Repository Layout

1) Python Smoke Test

2) TypeScript Smoke Test

3) Start Postgres with Docker

4) Seed Postgres from data/raw

5) Run the Deep Analysis Pipeline

6) Launch the Interactive Dashboard

7) Open the Analysis Notebook

8) Run Python Tests

Optional: Launch a Dev Container with both runtimes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

4) Seed Postgres from `data/raw`

Packages