State calibration (policy_data.db) uses stale 2022-2023 targets for 2024 sim

## Summary

The L0/CD calibration system (`policy_data.db`) uses targets from 2022 IRS SOI actuals and 2023 CBO projections, but the simulation runs at `time_period=2024`. Meanwhile, the national ECPS calibration (`loss.py`) correctly reads 2024-indexed values from the CBO/Treasury YAML parameters. This means the state .h5 files are calibrated to the wrong aggregate totals.

## Evidence

Comparing the DB targets (used by state/CD calibration) vs the YAML parameter values (used by national calibration):

| Variable | DB Target | DB Source/Year | YAML (2024) | YAML Source | Delta |
|---|---|---|---|---|---|
| `income_tax` | **$2,051B** | IRS SOI actual, **2022** | **$2,426B** | CBO projection, 2024 | **-18%** |
| `social_security` | **$1,379B** | CBO projection, **2023** | **$1,454B** | CBO projection, 2024 | -5% |
| `snap` | **$107B** | CBO projection, **2023** | **$94B** | CBO projection, 2024 | +14% |
| `ssi` | **$60.1B** | CBO projection, **2023** | **$57B** | CBO projection, 2024 | +5% |
| `eitc` | **$64.4B** | Treasury, **2023** | **$67.3B** | Treasury, 2024 | -4% |
| `refundable_ctc` | **$33.1B** | IRS SOI actual, **2022** | not targeted | — | — |
| `unemployment_compensation` | **$35B** | CBO, **2023** | **$34.7B** | CBO, 2024 | +1% |

Income tax is the most significant discrepancy: the DB uses $2,051B (2022 SOI actual) while the correct 2024 CBO projection is $2,426B — an 18% gap.

## How to verify

```python
# DB targets (L0/CD calibration)
import sqlite3
conn = sqlite3.connect("policyengine_us_data/storage/calibration/policy_data.db")
cur = conn.execute("""
    SELECT variable, period, value FROM targets 
    WHERE variable IN ('income_tax','social_security','snap','ssi','eitc','refundable_ctc','unemployment_compensation')
      AND active = 1
      AND stratum_id NOT IN (SELECT stratum_id FROM stratum_constraints 
                             WHERE constraint_variable IN ('congressional_district_geoid','state_fips'))
    ORDER BY variable, period
""")
for row in cur: print(row)

# YAML parameters (national calibration) 
from policyengine_us import Microsimulation
sim = Microsimulation()
params = sim.tax_benefit_system.parameters
for var in ['income_tax','snap','social_security','ssi','unemployment_compensation']:
    print(var, params(2024).calibration.gov.cbo._children[var])
print('eitc', params(2024).calibration.gov.treasury.tax_expenditures.eitc)
```

## Impact on stacked state aggregates

We compared stacked-state totals (summing all 51 state .h5 files) against both target sets:

| Variable | Stacked States | vs DB Target | vs YAML (2024) |
|---|---|---|---|
| `income_tax` | $2,196B | 107% | **90%** |
| `social_security` | $1,282B | 93% | **88%** |
| `snap` | $97B | 91% | 103% |
| `eitc` | $62.7B | 97% | 93% |

The stacked states overshoot the DB income_tax target by 7% (because the DB target is too low), but undershoot the correct 2024 value by 10%.

## Root cause

- `fit_calibration_weights.py` runs at `time_period = 2024` (line 83)
- `SparseMatrixBuilder` calculates variables at `self.time_period` (2024) to build the loss matrix
- But the target values in `policy_data.db` were populated from 2022 SOI and 2023 CBO data and never updated
- The `period` column in the targets table is metadata only — not used to select the correct target year

In contrast, `loss.py` dynamically reads `sim.tax_benefit_system.parameters(time_period).calibration.gov.cbo._children[variable_name]`, which correctly resolves to the 2024 YAML value.

## Proposed fix

Update `policy_data.db` national targets to use 2024 values from the same YAML parameters that `loss.py` uses. This could be:

1. **Quick fix**: SQL UPDATE to set correct 2024 values for the ~7 affected national targets
2. **Structural fix**: Have the DB ETL read from the YAML parameters (like `loss.py` does) so they stay in sync automatically. The `loss.py` comment at line 12-14 already notes this: *"A future PR should wire build_loss_matrix() to read from the database so this dict can be deleted."*

Option 2 is preferred since it prevents future drift. The ETL that populates `policy_data.db` should call `sim.tax_benefit_system.parameters(2024).calibration.gov.cbo._children[var]` for CBO programs and `parameters(2024).calibration.gov.treasury.tax_expenditures.eitc` for EITC.

## Files involved

- **DB**: `policyengine_us_data/storage/calibration/policy_data.db` (targets table)
- **DB ETL**: `policyengine_us_data/db/` (populates targets)
- **L0 calibration**: `policyengine_us_data/datasets/cps/local_area_calibration/fit_calibration_weights.py`
- **Sparse matrix builder**: `policyengine_us_data/datasets/cps/local_area_calibration/sparse_matrix_builder.py`
- **National calibration (reference)**: `policyengine_us_data/utils/loss.py`
- **CBO YAML params**: installed at `policyengine_us/parameters/calibration/gov/cbo/*.yaml`
- **Treasury EITC YAML**: `policyengine_us/parameters/calibration/gov/treasury/tax_expenditures/eitc.yaml`

Variable	DB Target	DB Source/Year	YAML (2024)	YAML Source	Delta
`income_tax`	$2,051B	IRS SOI actual, 2022	$2,426B	CBO projection, 2024	-18%
`social_security`	$1,379B	CBO projection, 2023	$1,454B	CBO projection, 2024	-5%
`snap`	$107B	CBO projection, 2023	$94B	CBO projection, 2024	+14%
`ssi`	$60.1B	CBO projection, 2023	$57B	CBO projection, 2024	+5%
`eitc`	$64.4B	Treasury, 2023	$67.3B	Treasury, 2024	-4%
`refundable_ctc`	$33.1B	IRS SOI actual, 2022	not targeted	—	—
`unemployment_compensation`	$35B	CBO, 2023	$34.7B	CBO, 2024	+1%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State calibration (policy_data.db) uses stale 2022-2023 targets for 2024 sim #503

Summary

Evidence

How to verify

Impact on stacked state aggregates

Root cause

Proposed fix

Files involved

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Variable	Stacked States	vs DB Target	vs YAML (2024)
`income_tax`	$2,196B	107%	90%
`social_security`	$1,282B	93%	88%
`snap`	$97B	91%	103%
`eitc`	$62.7B	97%	93%

State calibration (policy_data.db) uses stale 2022-2023 targets for 2024 sim #503

Description

Summary

Evidence

How to verify

Impact on stacked state aggregates

Root cause

Proposed fix

Files involved

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions