Skip to content

Add Colab notebooks for poison detection and style ablation#196

Closed
davidoj wants to merge 2 commits intomainfrom
colab-notebooks
Closed

Add Colab notebooks for poison detection and style ablation#196
davidoj wants to merge 2 commits intomainfrom
colab-notebooks

Conversation

@davidoj
Copy link
Copy Markdown
Contributor

@davidoj davidoj commented Mar 18, 2026

Two interactive notebooks demonstrating bergson's capabilities:

  • poison_detection.ipynb: Injects fictional poison documents into Pile training data, fine-tunes Pythia-160M, and uses multi-probe attribution to trace the false fact back to poison sources
  • style_ablation.ipynb: Demonstrates style vs semantic attribution with preconditioner strategies and PCA ablation on Qwen3-0.6B

Both notebooks run on Colab Free (T4, 15GB VRAM). One cell doesn't (the best performing style ablation method), but it's flagged as needing more resources.

davidoj and others added 2 commits March 18, 2026 09:56
Two interactive notebooks demonstrating bergson's capabilities:

- **poison_detection.ipynb**: Injects fictional poison documents into
  Pile training data, fine-tunes Pythia-160M, and uses multi-probe
  attribution to trace the false fact back to poison sources
- **style_ablation.ipynb**: Demonstrates style vs semantic attribution
  with preconditioner strategies and PCA ablation on Qwen3-0.6B

Both notebooks run on Colab Free (T4, 15GB VRAM).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davidoj
Copy link
Copy Markdown
Contributor Author

davidoj commented Mar 27, 2026

superseded by #215

@davidoj davidoj closed this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant