Merge v3 figures into master (referenced by Notion case study + landing#90) by ZhengyaoJiang · Pull Request #1 · WecoAI/fraud-detection-case-study

ZhengyaoJiang · 2026-06-12T13:09:22Z

The 13 v3 figures on this branch are hot-linked by the internal Notion case study ("How to Frame the Puzzle for AutoResearch", May 8) and now also mirrored into WecoAI/landing#90, via raw.githubusercontent.com URLs pinned to this branch ref. Merging makes the commits permanently reachable from master so the Notion images survive any future branch cleanup.

Adds 13 PNGs, no changes to existing files.

🤖 Generated with Claude Code

7 charts from rerun-2026-04-23 (strict fit/transform API): - 01 scope trajectory, 02 scope dots - 03 guidance dots, 04 guidance variance - 05 loose-vs-strict per cell, 06 Full+EDA trajectory comparison - 07 twitter summary (1280x720)

- 08: Stacked bar showing 13 leakage instances (loose API) vs 0 (strict API), color-stacked by mechanism subtype (UID numeric agg=7, UID nunique agg=3, Label encoder=1, Frequency encoder=1, Graph embedding=1). - 09: Side-by-side schematic comparing build_features(train_df, val_df) to FeatureBuilder.fit/transform, with red arrow on val->encoder leak path and green dashed arrow on fit->transform state passing.

1280x720 single-image hook combining the two API design schematics with the 13 vs 0 reward-hacking instance counts and a 5-color stacked breakdown of the leakage subtypes.

Single 1280x720 image showing all three decisions side by side: - Panel 1 (Scope): dot plot of 3 conditions, AUC y-axis - Panel 2 (Guidance): dot plot of 4 conditions, shared y-axis - Panel 3 (Abstraction): stacked bar of 13 vs 0 reward-hacking instances Each panel has a punchline takeaway in its title. Bottom legend shows the 5 leakage subtypes for the stacked bar.

Lead with the most surprising finding (codebase abstraction, 13->0) to match Twitter-thread strategy. Dropped "Decision N" prefix since the new order doesn't match the blog post numbering.

23 methods x 12 strict-API runs (4 conditions x 3 seeds). Methods grouped into A=prompted-techniques (10 Kaggle), B=EDA-derived (need column meanings), C=default ML choices. Headline pattern visible at a glance: - Methods adopted per cell: None=9-11, EDA=15-17, Tech=6-11, Full=9-12 - Group B EDA-derived features are nearly absent without the EDA prompt - D1-anchored UID adoption tracks EDA presence, not technique-list presence

Two panels: - Top: cumulative unique methods proposed over 200 steps, mean+/-std per condition. Shows EDA-only explores broadest, Tech-only narrowest. - Bottom: per-method first-seen-step scatter, dots colored by condition. Shows when each prompted/EDA-derived/default method first appears. Data: bad_seeds/method_timeline.csv (1827 PLAN tags) and method_first_seen.csv (187 first-occurrences). Scanner regex-tags PLAN text from each [STEP][PLAN] block in run.log.

ZhengyaoJiang added 7 commits May 8, 2026 10:23

Add v3 case study figures (strict + loose-vs-strict)

3144d75

7 charts from rerun-2026-04-23 (strict fit/transform API): - 01 scope trajectory, 02 scope dots - 03 guidance dots, 04 guidance variance - 05 loose-vs-strict per cell, 06 Full+EDA trajectory comparison - 07 twitter summary (1280x720)

Add Twitter T1 hook image

ac9c64c

1280x720 single-image hook combining the two API design schematics with the 13 vs 0 reward-hacking instance counts and a 5-color stacked breakdown of the leakage subtypes.

Reorder Twitter overview: abstraction first, then scope, guidance

7f76c95

Lead with the most surprising finding (codebase abstraction, 13->0) to match Twitter-thread strategy. Dropped "Decision N" prefix since the new order doesn't match the blog post numbering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge v3 figures into master (referenced by Notion case study + landing#90)#1

Merge v3 figures into master (referenced by Notion case study + landing#90)#1
ZhengyaoJiang wants to merge 7 commits into
masterfrom
case-study-v3-figures

ZhengyaoJiang commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZhengyaoJiang commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant