Skip to content

Question: monet2photo training loss #30




I'm trying to train the monet2photo. My command line was:

python --dataroot ./datasets/monet2photo --name monet2photo --model cycle_gan --gpu_ids 0,1 --batchSize 8 --identity 0.5

The paper discussed using a batch size of 1, but I increased it to 8 to more fully occupy the GPUs. I think this is the only difference between what was described in the paper and my settings, but I may be wrong.

------------ Options -------------
align_data: False
batchSize: 8
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
dataroot: ./datasets/monet2photo
display_freq: 100
display_id: 1
display_winsize: 256
fineSize: 256
gpu_ids: [0, 1]
identity: 0.5
input_nc: 3
isTrain: True
lambda_A: 10.0
lambda_B: 10.0
loadSize: 286
lr: 0.0002
max_dataset_size: inf
model: cycle_gan
nThreads: 2
n_layers_D: 3
name: monet2photo
ndf: 64
ngf: 64
niter: 100
niter_decay: 100
no_flip: False
no_html: False
no_lsgan: False
norm: instance
output_nc: 3
phase: train
pool_size: 50
print_freq: 100
save_epoch_freq: 5
save_latest_freq: 5000
serial_batches: False
use_dropout: False
which_direction: AtoB
which_epoch: latest
which_model_netD: basic
which_model_netG: resnet_9blocks
-------------- End ----------------
#training images = 6287

I'm training on two GTX-1070s

I'm about 80 epochs in (~40 hours on my set up) and it seems like I'm oscillating between generated 'photos' that look okay-ish and 'photos' that look pretty 'meh', more like the original painting.

My loss declined pretty rapidly for the first 20 or so epochs, but now seems to be relatively stable with occasional crazy spikes:


I think it's improving slightly with each epoch based on the images and there seems to be a slight downward trend on the loss, but I also might just be kidding myself because I've been staring at it for a while. In other words, I'm not certain that what it's generating a epoch 80 is really that much better than epoch 30. Here's the most recent detailed loss curve.

newplot 3

Question: Is this expected behavior (more or less) or should I be concerned that I've plateaued and/or used the wrong settings. At 100 epochs the learning rate is set to start decreasing based on the default settings. Given that it's taking about 30 minutes per epoch and thus about 61 more hours to complete 200 epochs, I'm wondering if I should "keep on going" or "abort" and fix some settings.




No one assigned


    No labels
    No labels


    No projects


    No milestone


    None yet


    No branches or pull requests

    Issue actions