Description
Tasks
- Benchmark
- Unlearning method
- Evaluation
- Dataset
- None of the above
Feature request
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning by Li, Nathaniel, et al. Proceedings of the 41st International Conference on Machine Learning (ICML). 2024.
Paper: https://arxiv.org/abs/2403.03218
Site: https://www.wmdp.ai/
GitHub: https://github.com/centerforaisafety/wmdp?tab=readme-ov-file
Hugging Faces: https://huggingface.co/datasets/cais/wmdp

Motivation
It is important to use the WMDP dataset to test unlearning method's effectiveness on potential hazardous knowledge. Note that the WMDP paper also came with the RMU (Representation Misdirection of Unlearning) method, which was raised in Issue #66 and integrated in PR #69.

Another important note: The WMDP dataset itself does not contain hazardous knowledge in the "red area", but only questions related in the "yellow area". The removal of knowledge of the "red area" is inferred from removal of the "yellow area", as claimed by the paper's authors.