Conversation
dustinvtran
left a comment
There was a problem hiding this comment.
Have to go catch a flight but some preliminary comments:
.gitignore
Outdated
| # IDE related | ||
| .idea/ | ||
| .vscode/ | ||
|
|
There was a problem hiding this comment.
Can you remove changes that aren't relevant for this PR? This includes changes to .gitignore here as well as deletion of CSVs.
| from edward.util import copy | ||
|
|
||
| try: | ||
| from edward.models import Normal |
There was a problem hiding this comment.
As convention, we use 2-space indent.
| from __future__ import print_function | ||
|
|
||
| import six | ||
| import numpy as np |
There was a problem hiding this comment.
As convention, we alphabetize the ordering of the import libraries.
| "{0}. Your TensorFlow version is not supported.".format(e)) | ||
|
|
||
|
|
||
| class Renyi_divergence(VariationalInference): |
There was a problem hiding this comment.
As convention, we use CamelCase for class names.
| To perform the optimization, this class uses the techniques from | ||
| Renyi Divergence Variational Inference (Y. Li & al, 2016) | ||
|
|
||
| # Notes: |
There was a problem hiding this comment.
Docstrings are parsed as Markdown and formatted in a somewhat specific way as they appear on the API docs. I recommend following the other classes, where you would denote a subsection as #### Notes and when writing bullet points, do, e.g.,
#### Notes
+ bullet 1
+ bullet 2
+ maybe bulleted list in a bullet
dustinvtran
left a comment
There was a problem hiding this comment.
Great work! Some comments below. The code looks correct and only minor suggestions with respect to formatting are laid out.
Can you include a unit test? See, e.g., how KLpq is tested under the file tests/inferences/test_klpq.py.
| $ \text{D}_{R}^{(\alpha)}(q(z)||p(z \mid x)) | ||
| = \frac{1}{\alpha-1} \log \int q(z)^{\alpha} p(z \mid x)^{1-\alpha} dz $ | ||
|
|
||
| To perform the optimization, this class uses the techniques from |
There was a problem hiding this comment.
Periods at end of sentences. (If you'd look at the generated API for the class, I recommend compiling the website following instructions from docs/.)
| = \frac{1}{\alpha-1} \log \int q(z)^{\alpha} p(z \mid x)^{1-\alpha} dz $ | ||
|
|
||
| To perform the optimization, this class uses the techniques from | ||
| Renyi Divergence Variational Inference (Y. Li & al, 2016) |
There was a problem hiding this comment.
We use bibtex for handling references in docstrings. This is handled by adding the appropriate bib entry to docs/tex/bib.bib; make sure it's also written in the right order: we sort bib entries by their year, then alphabetically according to their citekey within each year.
When using references, you can produce (Li et al., 2016) and Li et al. (2016) by writing [@li2016renyi] and @li2016renyirespectively, assuming thatli2016renyi` is the citekey.
|
|
||
| # Notes: | ||
| - Renyi divergence does not have any analytic version. | ||
| - Renyi divergence does not have any version for non reparametrizable |
There was a problem hiding this comment.
It does but the gradient estimator in @li2016variational doesn't. I recommend just stating that this inference algorithm is restricted to variational approximations whose random variables all satisfy rv.reparameterization_type == tf.contrib.distributions.FULLY_REPARAMETERIZED.
Also, instead of checking this during build_loss_and_gradients I recommend checking this during the __init__. This sort of check is done statically any graph construction similar to how we check for compatible shapes in all latent variables and data during __init__.
|
|
||
| def initialize(self, | ||
| n_samples=32, | ||
| alpha=1., |
There was a problem hiding this comment.
As convention, we append all numerics with 0, e.g., 1.0.
| Number of samples from variational model for calculating | ||
| stochastic gradients. | ||
| alpha: float, optional. | ||
| Renyi divergence coefficient. |
There was a problem hiding this comment.
Could be useful to specify the domain of the coefficient. E.g., Must be greater than 0. or etc.
| "Variational Renyi inference only works with reparameterizable" | ||
| " models") | ||
|
|
||
| ######### |
There was a problem hiding this comment.
This function is only used in one location and is a one-liner; could you write that line instead of defining a new function?
examples/vae_renyi.py
Outdated
| scale=Dense(d, activation='softplus')(hidden)) | ||
|
|
||
| # Bind p(x, z) and q(z | x) to the same TensorFlow placeholder for x. | ||
| inference = Renyi_divergence({z: qz}, data={x: x_ph}) |
There was a problem hiding this comment.
This code looks exactly the same as an older version of vae.py but only differs in this line. To keep the VAE versions better synced, could you add a comment suggesting that this is also an alternative in the existing vae.py?
Ideally, we'd like a specific application where ed.RenyiDivergence produces better results by some metric than alternatives. IIRC, the paper had some interesting results for a Bayesian neural net on some specific UCI data sets. That would be great to have and reproduce some of their results.
If you don't have time for this, we can leave it off for now and raise it as a Github issue post-merging this PR.
| self.scale.get(x, 1.0) | ||
| * x_copy.log_prob(dict_swap[x])) | ||
|
|
||
| logF = [p - q for p, q in zip(p_log_prob, q_log_prob)] |
There was a problem hiding this comment.
Instead of logF, what about something like log_ratios, which is more Pythonic in snake_case and also more semantically meaningful?
Replaced LogF by log_ratios Fix convention errors
|
Thanks for the suggestion and the very informative feedback.
Will do later today. |
|
I've added some testing in a similar way as KLqp. (both normal_normal and the bernouilli distribution. |
| import tensorflow as tf | ||
|
|
||
| from edward.models import Bernoulli, Normal | ||
| from edward.inferences.renyi_divergence import RenyiDivergence |
There was a problem hiding this comment.
We should check the import works by instead using ed.RenyiDivergence in the test.
There was a problem hiding this comment.
Got some issue with this but it should be working now.
| [@li2016renyi]. | ||
|
|
||
| #### Notes | ||
| + The gradient estimator used here does not have any analytic version. |
There was a problem hiding this comment.
With Markdown formatting, you don't need the 4 spaces of indentation. E.g., you can just do
#### Notes
+ The gradient estimator ...
+ ...
| = \frac{1}{\alpha-1} \log \int q(z)^{\alpha} p(z \mid x)^{1-\alpha} dz.$ | ||
|
|
||
| The optimization is performed using the gradient estimator as defined in | ||
| [@li2016renyi]. |
There was a problem hiding this comment.
The citekey is being used as a direct object so it should be [@li2016renyi] -> @li2016renyi.
| + See Renyi Divergence Variational Inference [@li2016renyi] for | ||
| more details. | ||
| """ | ||
| if self.is_reparameterizable: |
There was a problem hiding this comment.
is_reparameterizable should be checked with the possible raising error during the __init__, and since it's checked there it doesn't need to be stored in the class. This also helps to remove one layer of indentation in this function.
| if self.backward_pass == 'max': | ||
| log_ratios = tf.stack(log_ratios) | ||
| log_ratios = tf.reduce_max(log_ratios, 0) | ||
| loss = tf.reduce_mean(log_ratios) |
There was a problem hiding this comment.
If I understood the code correctly, log_ratios when first created is a list of n_samples elements, where each element is a log ratio calculation per sample from q. For the min / max modes, we take the min / max of these log ratios, which is a scalar.
Is tf.reduce_mean for the loss needed? You can also remove the tf.stack line in the min and max cases in the same way you didn't use it for the self.alpha \approx 1 case.
There was a problem hiding this comment.
You're right. Thanks for spotting this.
examples/vae_renyi.py
Outdated
| [@li2016renyi] | ||
|
|
||
| #### Notes | ||
| This example is almost exactly similar to example/vae.py. |
There was a problem hiding this comment.
Sorry for the miscommunication. What I meant was that you can edit vae.py, comment out the 1-2 lines of code to use ed.RenyiDivergence, and add these notes there. This helps to compress the content in the examples, c.f., https://github.com/blei-lab/edward/blob/master/examples/bayesian_logistic_regression.py#L51.
There was a problem hiding this comment.
Removed vae_renyi.py and modifed vae.py instead.
The version of vae.py I had wasn't running though. So I've modified it quite a bit.
There was a problem hiding this comment.
Are you using the latest version of Edward? We updated a few details in vae.py so it actually runs better. For example, you should be using the observations library and a generator, which is far more transparent than the mnist_data class from that TensorFlow tutorial.
In addition, since vae.py is also our canonical VAE example, I prefer keeping it as ed.KLqp as the default, and with the renyi divergence option commented out; similarly, the top-level comments should be written in-line near the renyi divergence option instead.
If you have thoughts otherwise, happy to take alternative suggestions.
|
|
||
| class test_renyi_divergence_class(tf.test.TestCase): | ||
|
|
||
| def _test_normal_normal(self, Inference, *args, **kwargs): |
There was a problem hiding this comment.
Since RenyiDivergence is used across all tests, you don't need Inference as an arg to the test functions.
There was a problem hiding this comment.
I've used the same template as test_klpq where only KLpq is used during the tests and where Inference is stil an argument to the test functions.
But I think I have modified to be closer to what you had in mind
Merged example to VAE Simplify tests
Modify examples and test calls
|
It keeps failing the travis-ci check for python 2.7 but before getting into the proper testing of my code (fail to install matplotlib and seaborn). |
|
Looks like this is happening in Travis on any build. I'll look into it. |
Here is an implementation of the Renyi divergence variational inference.
There's also an example on VAEs.
Here is a link to the edward forum with some more info:
https://discourse.edwardlib.org/t/renyi-divergence-variational-inference/366/3
ps: Sorry for the quite messy commit history.