Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Descriptions

This repo contains source code and pre-processed corpora for Protecting Intellectual Property of Language Generation APIs with Lexical Watermark (accepted to AAAI 2022) (paper)

Dependencies

python3
spacy==3.0.0
numpy==1.19.5
scipy==1.5.4
nltk==3.5

Usage

git clone https://github.com/xlhex/NLG_api_watermark.git

Create watermark words

# obtain candidates words and their synonyms (in this example, the size of synonyms is 2)
python scripts/create_cand_pool.py meta_data/top800_syn_cand_adj.txt 2 > secret_set.txt 
# obtain watermarked words from candidates words and their synonyms (in this example, we use top10 candidate words)
python scripts/find_idx_4_watermark.py meta_data/test.tok.en secret_set.txt 10

An example showing how to watermark a corpus

python scripts/watermark_sub_from_secret_lib.py test/clean_test.txt test/secret_set.txt test/secret_idx.txt > watermarked_data.txt

An example showing how to calculate P-value

# clean data
python scripts/calcul_p_syn.py test/secret_idx.txt test/secret_set.txt test/clean_test.txt
# watermark data
python scripts/calcul_p_syn.py test/secret_idx.txt test/secret_set.txt test/watermarked_test.txt

Pre-processed data and fairseq checkpoints

Please download the watermarked training data and clean dev/test data here (please refer to fairseq for training)
Please download the watermarked imitation model here (please refer to fairseq for inference)

Citation

Please cite as:

@article{he2021protecting,
  title={Protecting intellectual property of language generation apis with lexical watermark},
  author={He, Xuanli and Xu, Qiongkai and Lyu, Lingjuan and Wu, Fangzhao and Wang, Chenguang},
  journal={arXiv preprint arXiv:2112.02701},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
meta_data		meta_data
scripts		scripts
test		test
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Descriptions

Dependencies

Usage

Create watermark words

An example showing how to watermark a corpus

An example showing how to calculate P-value

Pre-processed data and fairseq checkpoints

Citation

About

Releases

Packages

Languages

xlhex/NLG_api_watermark

Folders and files

Latest commit

History

Repository files navigation

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Descriptions

Dependencies

Usage

Create watermark words

An example showing how to watermark a corpus

An example showing how to calculate P-value

Pre-processed data and fairseq checkpoints

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages