Skip to content

feat: add WMDP dataset integration #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 9, 2025
Merged

Conversation

ruidazeng
Copy link
Owner

Add WMDP cyber dataset configuration for unlearning

  • Add WMDP_cyber_forget.yaml dataset configuration

  • Add WMDP_cyber_retain.yaml dataset configuration

  • Add wmdp unlearning experiment configuration

Manus AI added 5 commits April 8, 2025 23:04
Add WMDP cyber dataset configuration for unlearning

- Add WMDP_cyber_forget.yaml dataset configuration

- Add WMDP_cyber_retain.yaml dataset configuration

- Add wmdp unlearning experiment configuration

Implements locuslab#80
Complete WMDP dataset integration with evaluation support

- Add WMDPEvaluator class in src/evals/wmdp.py

- Register WMDPEvaluator in src/evals/__init__.py

- Add wmdp.yaml evaluation configuration

Implements locuslab#80
Change default trainer from GradAscent to RMU for WMDP dataset

- Update configs/experiment/unlearn/wmdp/default.yaml

- Also update evaluator from tofu to wmdp for consistency
Add corpus dataset configurations for cyber-forget-corpus and cyber-retain-corpus

- Add configs/data/datasets/WMDP_cyber_forget_corpus.yaml

- Add configs/data/datasets/WMDP_cyber_retain_corpus.yaml

- Update experiment configuration to use corpus datasets for unlearning
Update documentation to include WMDP benchmark information

- Add WMDP to README.md overview, components, and examples

- Add WMDP to docs/links.md benchmarks section

- Add WMDP evaluation examples to docs/evaluation.md
@ruidazeng ruidazeng merged commit 4626581 into main Apr 9, 2025
@ruidazeng ruidazeng deleted the feature/add-wmdp-dataset branch April 9, 2025 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant