-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Summary
The L0/CD calibration system (policy_data.db) uses targets from 2022 IRS SOI actuals and 2023 CBO projections, but the simulation runs at time_period=2024. Meanwhile, the national ECPS calibration (loss.py) correctly reads 2024-indexed values from the CBO/Treasury YAML parameters. This means the state .h5 files are calibrated to the wrong aggregate totals.
Evidence
Comparing the DB targets (used by state/CD calibration) vs the YAML parameter values (used by national calibration):
| Variable | DB Target | DB Source/Year | YAML (2024) | YAML Source | Delta |
|---|---|---|---|---|---|
income_tax |
$2,051B | IRS SOI actual, 2022 | $2,426B | CBO projection, 2024 | -18% |
social_security |
$1,379B | CBO projection, 2023 | $1,454B | CBO projection, 2024 | -5% |
snap |
$107B | CBO projection, 2023 | $94B | CBO projection, 2024 | +14% |
ssi |
$60.1B | CBO projection, 2023 | $57B | CBO projection, 2024 | +5% |
eitc |
$64.4B | Treasury, 2023 | $67.3B | Treasury, 2024 | -4% |
refundable_ctc |
$33.1B | IRS SOI actual, 2022 | not targeted | — | — |
unemployment_compensation |
$35B | CBO, 2023 | $34.7B | CBO, 2024 | +1% |
Income tax is the most significant discrepancy: the DB uses $2,051B (2022 SOI actual) while the correct 2024 CBO projection is $2,426B — an 18% gap.
How to verify
# DB targets (L0/CD calibration)
import sqlite3
conn = sqlite3.connect("policyengine_us_data/storage/calibration/policy_data.db")
cur = conn.execute("""
SELECT variable, period, value FROM targets
WHERE variable IN ('income_tax','social_security','snap','ssi','eitc','refundable_ctc','unemployment_compensation')
AND active = 1
AND stratum_id NOT IN (SELECT stratum_id FROM stratum_constraints
WHERE constraint_variable IN ('congressional_district_geoid','state_fips'))
ORDER BY variable, period
""")
for row in cur: print(row)
# YAML parameters (national calibration)
from policyengine_us import Microsimulation
sim = Microsimulation()
params = sim.tax_benefit_system.parameters
for var in ['income_tax','snap','social_security','ssi','unemployment_compensation']:
print(var, params(2024).calibration.gov.cbo._children[var])
print('eitc', params(2024).calibration.gov.treasury.tax_expenditures.eitc)Impact on stacked state aggregates
We compared stacked-state totals (summing all 51 state .h5 files) against both target sets:
| Variable | Stacked States | vs DB Target | vs YAML (2024) |
|---|---|---|---|
income_tax |
$2,196B | 107% | 90% |
social_security |
$1,282B | 93% | 88% |
snap |
$97B | 91% | 103% |
eitc |
$62.7B | 97% | 93% |
The stacked states overshoot the DB income_tax target by 7% (because the DB target is too low), but undershoot the correct 2024 value by 10%.
Root cause
fit_calibration_weights.pyruns attime_period = 2024(line 83)SparseMatrixBuildercalculates variables atself.time_period(2024) to build the loss matrix- But the target values in
policy_data.dbwere populated from 2022 SOI and 2023 CBO data and never updated - The
periodcolumn in the targets table is metadata only — not used to select the correct target year
In contrast, loss.py dynamically reads sim.tax_benefit_system.parameters(time_period).calibration.gov.cbo._children[variable_name], which correctly resolves to the 2024 YAML value.
Proposed fix
Update policy_data.db national targets to use 2024 values from the same YAML parameters that loss.py uses. This could be:
- Quick fix: SQL UPDATE to set correct 2024 values for the ~7 affected national targets
- Structural fix: Have the DB ETL read from the YAML parameters (like
loss.pydoes) so they stay in sync automatically. Theloss.pycomment at line 12-14 already notes this: "A future PR should wire build_loss_matrix() to read from the database so this dict can be deleted."
Option 2 is preferred since it prevents future drift. The ETL that populates policy_data.db should call sim.tax_benefit_system.parameters(2024).calibration.gov.cbo._children[var] for CBO programs and parameters(2024).calibration.gov.treasury.tax_expenditures.eitc for EITC.
Files involved
- DB:
policyengine_us_data/storage/calibration/policy_data.db(targets table) - DB ETL:
policyengine_us_data/db/(populates targets) - L0 calibration:
policyengine_us_data/datasets/cps/local_area_calibration/fit_calibration_weights.py - Sparse matrix builder:
policyengine_us_data/datasets/cps/local_area_calibration/sparse_matrix_builder.py - National calibration (reference):
policyengine_us_data/utils/loss.py - CBO YAML params: installed at
policyengine_us/parameters/calibration/gov/cbo/*.yaml - Treasury EITC YAML:
policyengine_us/parameters/calibration/gov/treasury/tax_expenditures/eitc.yaml