Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about implementation #8

Open
beatriz-ferreira opened this issue May 8, 2019 · 5 comments
Open

Some questions about implementation #8

beatriz-ferreira opened this issue May 8, 2019 · 5 comments

Comments

@beatriz-ferreira
Copy link

Hi,

I've been using your code in some experiments.
I have the following questions:

  1. Applying your recent committed changes to the loss actually resulted in predicted values with weird (larger) ranges in my experiments, which were weirder to convert to an image. I had to "roll back" to the previous version... Have you noticed such an impact?

  2. Shouldn't the last layer have a sigmoid as activation so that the output has values between 0 and 1? These values should be comparable to the input ones, which I think are rescaled to be between 0 and 1, I am correct? Does this affect the reconstruction loss?

  3. Also, in some other implementations the common reconstruction loss is the mean squared error and not the mean absolute error. Do you use 'mae" for some reason?

  4. This is an extra issue that I'm having. Have you been able to use the Tensorboard callback to log the losses and metrics? When trying to add the Tensorboard callback I get an error which I think is because the ae model is made of two models, and thus internally has more than one loss. I get the following error: line 1050, in _write_custom_summaries
    summary_value.simple_value = value.item()
    ValueError: can only convert an array of size 1 to a Python scalar
    I could not find a solution yet..!

  5. Minor detail: Why changing the stddev to its absolute value? Can it ever be negative?!

I'm sorry for the long text and for raising all these issues, but I think they may be relevant for more users too!

Thank you in advance!

@alecGraves
Copy link
Owner

alecGraves commented May 13, 2019

  1. The last commit changed the applied value of beta loss to be summed instead of averaged over the values in the latent space, which I think is what is done in other implementations. This greatly increases beta's contribution to the gradient.

  2. Changing the output to sigmoid would force the output to the desired range, so it is probably a good idea. Without that, the problem is likely much more difficult for the network to learn. I will test out this change.

  3. I am using the mean absolute error / L1 distance because that is what was used in the cyclegan paper, and I just remembered that as I was making this.

  4. I have not tried to use tensorboard with this system yet. Post something if you figure it out! I am interested what the solution could be.

  5. I made this change mostly because negative std deviation did not make sense to me. And I am pretty sure it would break the loss function (Is negative stddev a problem? #4) (see below)

Thanks for the questions 😄

@alecGraves
Copy link
Owner

  1. Update: the variable named stddev (which was the output of the previous layer) actually represents log variance, which can be negative. I corrected the variable name and undid the abs in 810506b
    1. This is also kinda a better resolution to Is negative stddev a problem? #4

@beatriz-ferreira
Copy link
Author

Thank you for your reply and updates!
I'm going to test the refactored version and I'll let you know if something changes on my experiments. I saw you added a tanh activation. I'll also let you know if I figure something out regarding the use of tensorflow.

Please let me know if you happen figure something out too :)

Thank you

@beatriz-ferreira
Copy link
Author

Hi!

I've tested your refactored version with my experiments. Results are different! For the better since I am able to get better reconstructions. Cool, thank you!

Just a question: is there any difference in feeding the auto-encoder with a range [-1,1] like you do, or feed the images in the range [0,1]? I'm using the second option and everything looks fine. The auto-encoder should adapt to the range (sampling layer adapts to any distribution), correct?
The only thing I think I should change is the final activation layer to a sigmoid so that my outputs are also in the range [0,1]. The loss function should be the same?

Thank you again!

@alecGraves
Copy link
Owner

Yes, the network should adapt to the different range without a problem. Changing the output layer to sigmoid would probably help the network because you are constraining the output to the desired range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants