A repository containing datasets and tools to train a watermark classifier.
The datasets folder contains WebDataset files using the following keys:
- key: A unique identifier within the dataset
- url: The URL of the image data
- caption: The caption describing the images
Preliminary dataset annotations and WIP tools can be found on https://github.com/robvanvolt/DALLE-tools. More mature tools and completely annotated datasets will be transferred to this repository. Feel free to adjust the structure, upload new annotations or annotation tools.
The training and evaluation source code is availible under ./training/
Note: Deepspeed is still unstable
The current model is at ./models/watermark_model_v1.pt
and availible in the release as well
See example_use.py
WIP - A tool to annotate the url-caption datasets online will be shortly uploaded.