Add L0 Regularization, make a better small ECPS #364

baogorek · 2025-07-14T13:06:54Z

Closes #197 , Closes #356 refences L1 but I think this is in the spirit of it)

…zation-in-enhanced_cps.py Implement HardConcrete L0 regularization

…nto bogorek-l0

…de-block-for-diagnostics Deduplicate reweighting diagnostics

juaristi22

Nice! Excited to see what this dataset can do!

policyengine_us_data/datasets/cps/small_enhanced_cps.py

policyengine_us_data/tests/test_datasets/test_sparse_enhanced_cps.py

policyengine_us_data/utils/loss.py

…nts-to-logging Replace prints with logging

…data into bogorek-l0

nikhilwoodruff · 2025-07-16T13:41:34Z

Thanks!! But a bit paranoid of course about this new dataset. Can you add to this PR:

Details of the JCT tax expenditure tests/rough sense check of the calibration scores
How the score of the OBBBA changes (net cost, winner/loser shares, total net cost among filers with >1m AGI)
Number of household observations (nonzero weights) per decile (so we know if the decile impacts are not going to get super weird/noisy)

All of the above compared to the regular ECPS before this PR.

@PavelMakarchuk can you share 3 policy jsons/structural reforms that @baogorek can run, along with expected revenue numbers.

baogorek · 2025-07-16T17:42:43Z

@nikhilwoodruff I copied over all the tests that the ecps has to pass, and it passed all but one. The one it failed on was;

SSN card type "NONE" count: 0, target: 13000000, error: 100.00%

So I suppose this sparse solution happened not to include anyone with NONE for ssn_card_type. How big of a deal do you think that is?

FYI, I made the threshold in the test very large so it would pass and log it out, so eventually I need to fix that.

The calibration log looks pretty good!
calibration_log_sparse.csv

MaxGhenis · 2025-07-16T20:16:46Z

Given the extent to which OBBBA affects the estimated 13 million undocumented immigrants, I think it's problematic to report that zero exist.

nikhilwoodruff · 2025-07-17T09:07:29Z

OK @baogorek could you increase the number til we don't hit this error?

MaxGhenis · 2025-07-17T09:38:34Z

How many records in this dataset?

baogorek · 2025-07-17T17:25:03Z

@nikhilwoodruff , @MaxGhenis I'm passing all tests, including the ssn_card_type test with the original threshold (see starting at line 150 of policyengine_us_data/tests/test_datasets/test_sparse_enhanced_cps.py). There are 5,971 non-zero weights in this run, as can be seen at the end of the build data sets log. There is some non-determinism and I've seen the loss go lower locally, so I could investigate that, or just double the number of epochs for the final run.

MaxGhenis

Good to see that epochs and other hyperparameters solved it

…nto bogorek-l0

baogorek · 2025-07-18T04:22:20Z

@nikhilwoodruff this is passing all of the tests that the ordinary ecps is passing (the previous failure was due to not storing enums correctly in the h5). Is this enough to go through with this as a sparse MVP? If I need to test with OBBBA reforms, @PavelMakarchuk , I will need your help. There are slightly over 5k households in this sparse dataset, which is enough to estimate the ssn card type == "NONE" total within the original accuracy thresholds. I believe this model is solid.

nikhilwoodruff · 2025-07-18T12:43:45Z

OK LGTM. We have dataset versioning anyway. Let's merge

* initial commit of L0 branch * Add HardConcrete L0 regularization * l0 example completed * removing commented code * pre lint cleanup * post-lint cleanup * Refactor reweighting diagnostics * removed _clean from names in the reweighting function * modifying print function and test * Convert diagnostics prints to logging * removing unused variable * setting high tolerance for ssn test just to pass * linting * fixed data set creation logic. Modified parameters * docs. more epochs

* Use normal runner in PR tests * added the 3.11.12 pin * cps.py * adding diagnostics * lint * taking out bad targets * fixing workflow arg passthrough * deps and defaults * wrong pipeline for manual test * trying again to get the manual test to work * reverting to older workflow code * cleaning up enhanced_cps.py * Update package version * removing github download option. Switching to hugging face downloads * changelog entry * reverting the old code changes workflow * Update package version * start cleaning calibration targets * add us package to dependencies * update csv paths in tests too * manual test * pr * updates * trying to get the right workflow to run * taking out the token * ready for review * Update package version * adding diagnostics * taking out bad targets * fixing workflow arg passthrough * wrong pipeline for manual test * Update package version * removing github download option. Switching to hugging face downloads * reverting the old code changes workflow * remove districting file * remove duplications from merge with main * add changelog_entry * Add L0 Regularization, make a better small ECPS (#364) * initial commit of L0 branch * Add HardConcrete L0 regularization * l0 example completed * removing commented code * pre lint cleanup * post-lint cleanup * Refactor reweighting diagnostics * removed _clean from names in the reweighting function * modifying print function and test * Convert diagnostics prints to logging * removing unused variable * setting high tolerance for ssn test just to pass * linting * fixed data set creation logic. Modified parameters * docs. more epochs * Update package version * Pin microdf * adding diagnostics * taking out bad targets * Update package version * start cleaning calibration targets * trying to get the right workflow to run * ready for review * taking out bad targets * restore changes lost when merging with main * more cleanup * even more cleanup * fix file paths in new sparse ecps test * lint * fixing merge --------- Co-authored-by: Nikhil Woodruff <35577657+nikhilwoodruff@users.noreply.github.com> Co-authored-by: baogorek <baogorek@gmail.com> Co-authored-by: MaxGhenis <MaxGhenis@users.noreply.github.com> Co-authored-by: baogorek <baogorek@users.noreply.github.com> Co-authored-by: nikhilwoodruff <nikhilwoodruff@users.noreply.github.com>

baogorek added 13 commits July 14, 2025 09:06

initial commit of L0 branch

61e47f3

Add HardConcrete L0 regularization

1d1b7a0

Merge pull request #365 from PolicyEngine/codex/implement-l0-regulari…

f2b6666

…zation-in-enhanced_cps.py Implement HardConcrete L0 regularization

l0 example completed

0660f64

merge main

e71f17f

removing commented code

9aef356

pre lint cleanup

f09624b

post-lint cleanup

d75d578

Merge branch 'main' of github.com:PolicyEngine/policyengine-us-data i…

93b93c5

…nto bogorek-l0

Refactor reweighting diagnostics

ead3345

Merge pull request #370 from PolicyEngine/codex/refactor-redundant-co…

c626379

…de-block-for-diagnostics Deduplicate reweighting diagnostics

removed _clean from names in the reweighting function

9b97571

modifying print function and test

7f6b2e0

baogorek marked this pull request as ready for review July 16, 2025 02:34

baogorek requested review from juaristi22 and nikhilwoodruff July 16, 2025 02:34

juaristi22 reviewed Jul 16, 2025

View reviewed changes

policyengine_us_data/datasets/cps/small_enhanced_cps.py Outdated Show resolved Hide resolved

policyengine_us_data/tests/test_datasets/test_sparse_enhanced_cps.py Show resolved Hide resolved

policyengine_us_data/utils/loss.py Outdated Show resolved Hide resolved

baogorek added 4 commits July 16, 2025 09:11

Convert diagnostics prints to logging

608d2b7

Merge pull request #371 from PolicyEngine/codex/convert-print-stateme…

3a80c1a

…nts-to-logging Replace prints with logging

removing unused variable

86151c2

Merge branch 'bogorek-l0' of github.com:PolicyEngine/policyengine-us-…

6079e42

…data into bogorek-l0

baogorek requested a review from juaristi22 July 16, 2025 13:25

baogorek added 2 commits July 16, 2025 12:29

setting high tolerance for ssn test just to pass

d725c1e

linting

201b0bf

fixed data set creation logic. Modified parameters

be9e959

baogorek requested a review from MaxGhenis July 17, 2025 17:25

MaxGhenis approved these changes Jul 17, 2025

View reviewed changes

baogorek added 2 commits July 17, 2025 23:50

Merge branch 'main' of github.com:PolicyEngine/policyengine-us-data i…

c7c6255

…nto bogorek-l0

docs. more epochs

bc5de43

nikhilwoodruff merged commit 8025184 into main Jul 18, 2025
7 checks passed

nikhilwoodruff deleted the bogorek-l0 branch July 18, 2025 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add L0 Regularization, make a better small ECPS #364

Add L0 Regularization, make a better small ECPS #364

Uh oh!

baogorek commented Jul 14, 2025 •

edited

Loading

Uh oh!

juaristi22 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikhilwoodruff commented Jul 16, 2025 •

edited

Loading

Uh oh!

baogorek commented Jul 16, 2025

Uh oh!

MaxGhenis commented Jul 16, 2025

Uh oh!

nikhilwoodruff commented Jul 17, 2025

Uh oh!

MaxGhenis commented Jul 17, 2025

Uh oh!

baogorek commented Jul 17, 2025

Uh oh!

MaxGhenis left a comment

Uh oh!

baogorek commented Jul 18, 2025 •

edited

Loading

Uh oh!

nikhilwoodruff commented Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add L0 Regularization, make a better small ECPS #364

Add L0 Regularization, make a better small ECPS #364

Uh oh!

Conversation

baogorek commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juaristi22 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikhilwoodruff commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baogorek commented Jul 16, 2025

Uh oh!

MaxGhenis commented Jul 16, 2025

Uh oh!

nikhilwoodruff commented Jul 17, 2025

Uh oh!

MaxGhenis commented Jul 17, 2025

Uh oh!

baogorek commented Jul 17, 2025

Uh oh!

MaxGhenis left a comment

Choose a reason for hiding this comment

Uh oh!

baogorek commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikhilwoodruff commented Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

baogorek commented Jul 14, 2025 •

edited

Loading

nikhilwoodruff commented Jul 16, 2025 •

edited

Loading

baogorek commented Jul 18, 2025 •

edited

Loading