Merge v3 figures into master (referenced by Notion case study + landing#90)#1
Open
ZhengyaoJiang wants to merge 7 commits into
Open
Merge v3 figures into master (referenced by Notion case study + landing#90)#1ZhengyaoJiang wants to merge 7 commits into
ZhengyaoJiang wants to merge 7 commits into
Conversation
7 charts from rerun-2026-04-23 (strict fit/transform API): - 01 scope trajectory, 02 scope dots - 03 guidance dots, 04 guidance variance - 05 loose-vs-strict per cell, 06 Full+EDA trajectory comparison - 07 twitter summary (1280x720)
- 08: Stacked bar showing 13 leakage instances (loose API) vs 0 (strict API), color-stacked by mechanism subtype (UID numeric agg=7, UID nunique agg=3, Label encoder=1, Frequency encoder=1, Graph embedding=1). - 09: Side-by-side schematic comparing build_features(train_df, val_df) to FeatureBuilder.fit/transform, with red arrow on val->encoder leak path and green dashed arrow on fit->transform state passing.
1280x720 single-image hook combining the two API design schematics with the 13 vs 0 reward-hacking instance counts and a 5-color stacked breakdown of the leakage subtypes.
Single 1280x720 image showing all three decisions side by side: - Panel 1 (Scope): dot plot of 3 conditions, AUC y-axis - Panel 2 (Guidance): dot plot of 4 conditions, shared y-axis - Panel 3 (Abstraction): stacked bar of 13 vs 0 reward-hacking instances Each panel has a punchline takeaway in its title. Bottom legend shows the 5 leakage subtypes for the stacked bar.
Lead with the most surprising finding (codebase abstraction, 13->0) to match Twitter-thread strategy. Dropped "Decision N" prefix since the new order doesn't match the blog post numbering.
23 methods x 12 strict-API runs (4 conditions x 3 seeds). Methods grouped into A=prompted-techniques (10 Kaggle), B=EDA-derived (need column meanings), C=default ML choices. Headline pattern visible at a glance: - Methods adopted per cell: None=9-11, EDA=15-17, Tech=6-11, Full=9-12 - Group B EDA-derived features are nearly absent without the EDA prompt - D1-anchored UID adoption tracks EDA presence, not technique-list presence
Two panels: - Top: cumulative unique methods proposed over 200 steps, mean+/-std per condition. Shows EDA-only explores broadest, Tech-only narrowest. - Bottom: per-method first-seen-step scatter, dots colored by condition. Shows when each prompted/EDA-derived/default method first appears. Data: bad_seeds/method_timeline.csv (1827 PLAN tags) and method_first_seen.csv (187 first-occurrences). Scanner regex-tags PLAN text from each [STEP][PLAN] block in run.log.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The 13 v3 figures on this branch are hot-linked by the internal Notion case study ("How to Frame the Puzzle for AutoResearch", May 8) and now also mirrored into WecoAI/landing#90, via raw.githubusercontent.com URLs pinned to this branch ref. Merging makes the commits permanently reachable from master so the Notion images survive any future branch cleanup.
Adds 13 PNGs, no changes to existing files.
🤖 Generated with Claude Code