This repository contains code for the ICLR 2024 paper On the Learnability of Watermarks for Language Models by Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto.
To install the necessary packages, first create a conda environment.
conda create -n <env_name> python=3.11
conda activate <env_name>
Then, install the required packages with
pip install -r requirements.txt
We include scripts for reproducing experiments in the paper in the scripts
directory, which also serve as examples for how to run the files in this repository. README.md
's within scripts
provide instructions on how to run the scripts. Note that all scripts should be run from the top-level directory.
Feel free to create an issue if you encounter any problems or bugs!
Code in the watermarks/kgw
directory is from github.com/jwkirchenbauer/lm-watermarking. In the watermarks/kth
directory, detect.py
, levenshtein.pyx
, and mersenne.py
are from github.com/jthickstun/watermark. train_logit_distill.py
and train_sampling_distill.py
are adapted from github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py.
Below are links to trained model weights from the paper's experiments (hosted on Hugging Face). They can also be found at this Hugging Face collection.
-
KGW
$k = 0, \gamma = 0.25, \delta = 1$ -
KGW
$k = 0, \gamma = 0.25, \delta = 2$ -
KGW
$k = 1, \gamma = 0.25, \delta = 1$ -
KGW
$k = 1, \gamma = 0.25, \delta = 2$ -
KGW
$k = 2, \gamma = 0.25, \delta = 2$ - Aar k = 2
- Aar k = 3
- Aar k = 4
- KTH s = 1
- KTH s = 2
- KTH s = 4
- KTH s = 256
-
KGW
$k = 0, \gamma = 0.25, \delta = 1$ -
KGW
$k = 0, \gamma = 0.25, \delta = 2$ -
KGW
$k = 1, \gamma = 0.25, \delta = 1$ -
KGW
$k = 1, \gamma = 0.25, \delta = 2$ -
KGW
$k = 2, \gamma = 0.25, \delta = 2$ - Aar k = 2
- Aar k = 3
- Aar k = 4
- KTH s = 1
- KTH s = 2
- KTH s = 4
- KTH s = 256
-
KGW
$k = 0, \gamma = 0.25, \delta = 1$ -
KGW
$k = 0, \gamma = 0.25, \delta = 2$ -
KGW
$k = 1, \gamma = 0.25, \delta = 1$ -
KGW
$k = 1, \gamma = 0.25, \delta = 2$ -
KGW
$k = 2, \gamma = 0.25, \delta = 2$ - Aar k = 2
- Aar k = 3
- Aar k = 4
- KTH s = 1
- KTH s = 2
- KTH s = 4
- KTH s = 256
Below are links to the watermarked training data used for the paper's sampling-based watermark distillation experiments (hosted on Hugging Face). They can also be found at this Hugging Face collection.
-
KGW
$k = 0, \gamma = 0.25, \delta = 1$ -
KGW
$k = 0, \gamma = 0.25, \delta = 2$ -
KGW
$k = 1, \gamma = 0.25, \delta = 1$ -
KGW
$k = 1, \gamma = 0.25, \delta = 2$ -
KGW
$k = 2, \gamma = 0.25, \delta = 2$ - Aar k = 2
- Aar k = 3
- Aar k = 4
- KTH s = 1
- KTH s = 2
- KTH s = 4
- KTH s = 256
Please cite this paper using the following BibTex entry:
@inproceedings{gu2024learnability,
title={On the Learnability of Watermarks for Language Models},
author={Chenchen Gu and Xiang Lisa Li and Percy Liang and Tatsunori Hashimoto},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://arxiv.org/abs/2312.04469}
}