On the Learnability of Watermarks for Language Models

This repository contains code for the ICLR 2024 paper On the Learnability of Watermarks for Language Models by Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto.

Setup

To install the necessary packages, first create a conda environment.

conda create -n <env_name> python=3.11
conda activate <env_name>

Then, install the required packages with

pip install -r requirements.txt

Usage

We include scripts for reproducing experiments in the paper in the scripts directory, which also serve as examples for how to run the files in this repository. README.md's within scripts provide instructions on how to run the scripts. Note that all scripts should be run from the top-level directory.

Feel free to create an issue if you encounter any problems or bugs!

References

Code in the watermarks/kgw directory is from github.com/jwkirchenbauer/lm-watermarking. In the watermarks/kth directory, detect.py, levenshtein.pyx, and mersenne.py are from github.com/jthickstun/watermark. train_logit_distill.py and train_sampling_distill.py are adapted from github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py.

Models

Below are links to trained model weights from the paper's experiments (hosted on Hugging Face). They can also be found at this Hugging Face collection.

Logit-based watermark distilled Llama 2 7B

KGW $k = 0, \gamma = 0.25, \delta = 1$
KGW $k = 0, \gamma = 0.25, \delta = 2$
KGW $k = 1, \gamma = 0.25, \delta = 1$
KGW $k = 1, \gamma = 0.25, \delta = 2$
KGW $k = 2, \gamma = 0.25, \delta = 2$
Aar k = 2
Aar k = 3
Aar k = 4
KTH s = 1
KTH s = 2
KTH s = 4
KTH s = 256

Sampling-based watermark distilled Llama 2 7B

KGW $k = 0, \gamma = 0.25, \delta = 1$
KGW $k = 0, \gamma = 0.25, \delta = 2$
KGW $k = 1, \gamma = 0.25, \delta = 1$
KGW $k = 1, \gamma = 0.25, \delta = 2$
KGW $k = 2, \gamma = 0.25, \delta = 2$
Aar k = 2
Aar k = 3
Aar k = 4
KTH s = 1
KTH s = 2
KTH s = 4
KTH s = 256

Sampling-based watermark distilled Pythia 1.4B

KGW $k = 0, \gamma = 0.25, \delta = 1$
KGW $k = 0, \gamma = 0.25, \delta = 2$
KGW $k = 1, \gamma = 0.25, \delta = 1$
KGW $k = 1, \gamma = 0.25, \delta = 2$
KGW $k = 2, \gamma = 0.25, \delta = 2$
Aar k = 2
Aar k = 3
Aar k = 4
KTH s = 1
KTH s = 2
KTH s = 4
KTH s = 256

Training data for sampling-based watermark distillation

Below are links to the watermarked training data used for the paper's sampling-based watermark distillation experiments (hosted on Hugging Face). They can also be found at this Hugging Face collection.

KGW $k = 0, \gamma = 0.25, \delta = 1$
KGW $k = 0, \gamma = 0.25, \delta = 2$
KGW $k = 1, \gamma = 0.25, \delta = 1$
KGW $k = 1, \gamma = 0.25, \delta = 2$
KGW $k = 2, \gamma = 0.25, \delta = 2$
Aar k = 2
Aar k = 3
Aar k = 4
KTH s = 1
KTH s = 2
KTH s = 4
KTH s = 256

Citation

Please cite this paper using the following BibTex entry:

@inproceedings{gu2024learnability,
    title={On the Learnability of Watermarks for Language Models},
    author={Chenchen Gu and Xiang Lisa Li and Percy Liang and Tatsunori Hashimoto},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://arxiv.org/abs/2312.04469}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On the Learnability of Watermarks for Language Models

Setup

Usage

References

Models

Logit-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Pythia 1.4B

Training data for sampling-based watermark distillation

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
experiments		experiments
scripts		scripts
watermarks		watermarks
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_logit_distill.py		train_logit_distill.py
train_sampling_distill.py		train_sampling_distill.py

License

chenchenygu/watermark-learnability

Folders and files

Latest commit

History

Repository files navigation

On the Learnability of Watermarks for Language Models

Setup

Usage

References

Models

Logit-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Pythia 1.4B

Training data for sampling-based watermark distillation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages