instruction-tuned-sd/data_preparation at main · sachintha443/instruction-tuned-sd

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
export_to_hub.py		export_to_hub.py
generate_dataset.py		generate_dataset.py
image_utils.py		image_utils.py
instructions.txt		instructions.txt
model_utils.py		model_utils.py
requirements.txt		requirements.txt

README.md

This directory provides utilities to create a Cartoonizer dataset for InstructPix2Pix like training.

Steps

We used 5000 randomly sampled images as the original images from the train set of ImageNette. To derive their cartoonized renditions, we used the Whitebox Cartoonizer model. For deriving the instructions.txt file, we used ChatGPT. In particular, we used the following prompt:

Provide al teast 50 synonymous sentences for the following instruction: "Cartoonize the following image."

Dataset preparation is divided into three steps:

Step 0: Install dependencies

pip install -q requirements.txt

Step 1: Obtain the image-cartoon pairs

python generate_dataset.py

If you want to use more than 5000 samples, specify the --max_num_samples option. One the image-cartoon pairs are generated, you should see a directory called cartoonizer-dataset directory (unless you specified a different one via --data_root):

Step 2: Export the dataset to 🤗 Hub

For this step, you need to be authorized to access your Hugging Face account. Run the following command to do so:

huggingface-cli login

Then run:

python export_to_hub.py

Warning

Please ensure that an empty DS_NAME dataset was created on the Hub first. Instructions on how to do that are here.

You can find a mini dataset here:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_preparation

data_preparation

README.md

Steps

Step 0: Install dependencies

Step 1: Obtain the image-cartoon pairs

Step 2: Export the dataset to 🤗 Hub

Files

data_preparation

Directory actions

More options

Directory actions

More options

Latest commit

History

data_preparation

Folders and files

parent directory

README.md

Steps

Step 0: Install dependencies

Step 1: Obtain the image-cartoon pairs

Step 2: Export the dataset to 🤗 Hub