Precompute target variable values to decouple microsimulation from calibration

## Problem

`SparseMatrixBuilder.build_matrix()` currently creates a fresh `Microsimulation` for each of the 51 states, sets `state_fips`, then calculates every target variable. This means:

- **Matrix build takes ~20 minutes** on M4 Max (the calibration training itself is fast after that)
- Microsimulation and calibration are tightly coupled — changing calibration targets requires re-running expensive simulations
- Adding new target variables requires the full microsimulation stack at calibration time
- Cannot parallelize across states (serial `Microsimulation` creation)

## Proposed solution

Separate the pipeline into two steps:

### 1. Precompute step (once, parallelizable)
For each state × household, compute all target variables and save to a single file:

```python
# Shape: (n_states, n_households, n_variables) or flat DataFrame
# Variables: state_income_tax, snap, health_insurance_premiums, household_count, person_count, etc.
precomputed = {}
for state_fips in all_states:
    sim = Microsimulation(dataset=stratified_cps)
    sim.set_input("state_fips", 2024, np.full(n_hh, state_fips))
    for var in target_variables:
        precomputed[(state_fips, var)] = sim.calculate(var, 2024, map_to="household").values
# Save as HDF5 or parquet
```

This can be trivially parallelized (51 independent simulations). State is also somewhat arbitrary as a chunking dimension — we just need every unique combination of geographic constraints evaluated.

### 2. Matrix build step (fast, pure NumPy)
Read the precomputed file, apply constraint masks, build the sparse matrix. No `Microsimulation` import needed. Should take seconds, not minutes.

### Benefits
- Matrix build drops from ~20 min to seconds
- Precomputed file can be cached on HuggingFace alongside `stratified_extended_cps.h5`
- Adding new calibration targets = just add rows to the target_filter, no re-simulation
- Precompute step can run on GPU instances or be parallelized across workers
- Clear separation of concerns: microsimulation vs. optimization

## Related

- #486 (replace CD stacking with cloning) — both simplify the calibration architecture
- #492 (state income tax targets) — motivated this since adding state_income_tax required the full matrix rebuild

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompute target variable values to decouple microsimulation from calibration #499

Problem

Proposed solution

1. Precompute step (once, parallelizable)

2. Matrix build step (fast, pure NumPy)

Benefits

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Precompute target variable values to decouple microsimulation from calibration #499

Description

Problem

Proposed solution

1. Precompute step (once, parallelizable)

2. Matrix build step (fast, pure NumPy)

Benefits

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions