[Feature Request] Add WMDP Dataset

### Tasks

- [ ] Benchmark
- [ ] Unlearning method
- [ ] Evaluation
- [x] Dataset
- [ ] None of the above

### Feature request

**The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning** by Li, Nathaniel, et al. Proceedings of the 41st International Conference on Machine Learning (ICML). 2024.

Paper: https://arxiv.org/abs/2403.03218
Site: https://www.wmdp.ai/
GitHub: https://github.com/centerforaisafety/wmdp?tab=readme-ov-file
Hugging Faces: https://huggingface.co/datasets/cais/wmdp

<img width="647" alt="Image" src="https://github.com/user-attachments/assets/b2311345-6921-46ea-8dd1-1edf5e109743" />

### Motivation

It is important to use the WMDP dataset to test unlearning method's effectiveness on potential hazardous knowledge. Note that the WMDP paper also came with the RMU (Representation Misdirection of Unlearning) method, which was raised in Issue #66 and integrated in PR #69.

<img width="641" alt="Image" src="https://github.com/user-attachments/assets/ac09cca0-d820-40c9-968f-ee94e9771d2b" />

**Another important note:** The WMDP dataset itself does not contain hazardous knowledge in the "red area", but only questions related in the "yellow area". The removal of knowledge of the "red area" is inferred from removal of the "yellow area", as claimed by the paper's authors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add WMDP Dataset #80

Tasks

Feature request

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add WMDP Dataset #80

Description

Tasks

Feature request

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions