Skip to content

[Feature Request] Add WMDP Dataset #80

@ruidazeng

Description

@ruidazeng

Tasks

  • Benchmark
  • Unlearning method
  • Evaluation
  • Dataset
  • None of the above

Feature request

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning by Li, Nathaniel, et al. Proceedings of the 41st International Conference on Machine Learning (ICML). 2024.

Paper: https://arxiv.org/abs/2403.03218
Site: https://www.wmdp.ai/
GitHub: https://github.com/centerforaisafety/wmdp?tab=readme-ov-file
Hugging Faces: https://huggingface.co/datasets/cais/wmdp

Image

Motivation

It is important to use the WMDP dataset to test unlearning method's effectiveness on potential hazardous knowledge. Note that the WMDP paper also came with the RMU (Representation Misdirection of Unlearning) method, which was raised in Issue #66 and integrated in PR #69.

Image

Another important note: The WMDP dataset itself does not contain hazardous knowledge in the "red area", but only questions related in the "yellow area". The removal of knowledge of the "red area" is inferred from removal of the "yellow area", as claimed by the paper's authors.

Metadata

Metadata

Assignees

Labels

benchmarkRequest to include new unlearning benchmark

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions