implementation of CycleGAN for unpaired image-to-image translation. The included dataset is the classic horse2zebra dataset.
CycleGAN consists of:
- 2 Generators: G_AB (A to B) and G_BA (B to A)
- 2 Discriminators: D_A (judges domain A) and D_B (judges domain B)
-
Set up Python environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Train the model (requires GPU for reasonable training time):
python scripts/train.py
- Checkpoints saved every epoch in
checkpoints/ - Training for 200 epochs (takes several hours on GPU)
- Checkpoints saved every epoch in
-
Generate translations:
python scripts/test.py
- Translated images saved in
outputs/ - Uses the final model checkpoint by default
- Translated images saved in
Here are some side-by-side comparisons of the translations from horses (domain A) to zebras (domain B) using the epoch 10 checkpoint (cherry-picked):
| Original (Horse) | Translated (Zebra) |
|---|---|
![]() |
![]() |
| Original (Horse) | Translated (Zebra) |
|---|---|
![]() |
![]() |
| Original (Horse) | Translated (Zebra) |
|---|---|
![]() |
![]() |
CycleGAN uses three types of losses:
- adversarial loss
- makes generated images look realistic
- discriminator can't tell real from fake
- cycle consistency loss
- A → B → A ≈ A (most important)
- preserves content while changing style
identity loss
- G_AB(B) ≈ B (optional but helpful)
- prevents unnecessary color changes
training process
- generator creates fake images
- discriminator learns to identify fakes
- generator improves to fool discriminator
- cycle consistency ensures meaningful translations
- iterate until convergence
patchGAN discriminator
- classifies 70×70 patches instead of whole images
- more efficient and effective for textures
instance normalization
- better than batch normalization for style transfer
- normalizes each image independently
replay buffer
- stores past generated images
- prevents discriminator from overfitting
generator architecture
- ResNet-based with skip connections
- encoder-decoder structure
- 9 residual blocks by default
discriminator architecture
- PatchGAN: classifies overlapping patches
- produces a grid of real/fake predictions
- more parameter-efficient than full image
training details
- Adam optimizer (lr=0.0002, β1=0.5)
- linear learning rate decay after 100 epochs
- batch size of 1 for stability





