-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Added implementation of SGHMC and example. #415
Conversation
decreases. | ||
Implements the update equations from (15) of Chen et al., 2014. | ||
""" | ||
# TODO: Would it be more memory-efficient to use tf.control_dependencies, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does making aliases make copies of the tensors? If so, would it save memory to use tf.control_dependencies
and mutate the objects directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure i understood the question but for example:
>>> x = tf.constant(0)
>>> y = x
>>> x, y
(<tf.Tensor 'Const:0' shape=() dtype=int32>, <tf.Tensor 'Const:0' shape=() dtype=int32>)
y has a reference to the same tensor and doesn't recreate it from scratch, if that's what you're asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was the question. Just wanted to make sure it wasn't recreating.
|
||
# Simulate Hamiltonian dynamics with friction. | ||
friction = tf.constant(self.friction, dtype = tf.float32) | ||
# TODO: Allow option for exponentially decaying learning rate, or similar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SGLD, we found that using a constant learning rate performed better than using exponential decay. Perhaps this should be an option passed into the inference method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that makes sense. note one difficulty i had when implementing SGLD was that i could not leverage any of the existing tensorflow optimizers, and if more learning rates were an option, one would need to implement them all from scratch. if there were a way to use the tensorflow optimizers, it would be nice so long as the implementation doesn't ruin the readability and/or speed of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it re: tensorflow.
I'll make a new PR that adds an option to keep the learning rate constant, along with an example showing cases where this may be a bit easier to work with.
z_sample : dict | ||
Latent variable keys to samples. | ||
""" | ||
# TODO: This appears to be identical across HMC, SGHMC, SGLD. Should it be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _log_joint
function looks like it's the same. Refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that sounds useful to refactor (although i'm not sure where the best location is to put it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Will think about this and submit another PR.
from edward.models import Empirical, MultivariateNormalFull | ||
|
||
# Helper functions | ||
# TODO: Where should these go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some helper functions to visualize the contours of a bivariate Gaussian (or more generally, any Edward distribution with a pdf method). Should this be refactored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's awesome. maybe we can leave it here for now and think about refactoring it in another PR that focuses more on general visualizations for edward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
if label: | ||
plt.clabel(cs, inline=1, fontsize=10) | ||
|
||
def retrieve_samples(qz): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took a bit of work to figure out how to grab the actual MCMC samples. Is there a method to do this that we missed, or should we move this elsewhere to allow for more general use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you just want the samples you can do qz.params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for submitting the PR! this is excellent work. excited to have SGMCMC methods to experiment with.
if label: | ||
plt.clabel(cs, inline=1, fontsize=10) | ||
|
||
def retrieve_samples(qz): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you just want the samples you can do qz.params
|
||
# Simulate Hamiltonian dynamics with friction. | ||
friction = tf.constant(self.friction, dtype = tf.float32) | ||
# TODO: Allow option for exponentially decaying learning rate, or similar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that makes sense. note one difficulty i had when implementing SGLD was that i could not leverage any of the existing tensorflow optimizers, and if more learning rates were an option, one would need to implement them all from scratch. if there were a way to use the tensorflow optimizers, it would be nice so long as the implementation doesn't ruin the readability and/or speed of the code.
decreases. | ||
Implements the update equations from (15) of Chen et al., 2014. | ||
""" | ||
# TODO: Would it be more memory-efficient to use tf.control_dependencies, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure i understood the question but for example:
>>> x = tf.constant(0)
>>> y = x
>>> x, y
(<tf.Tensor 'Const:0' shape=() dtype=int32>, <tf.Tensor 'Const:0' shape=() dtype=int32>)
y has a reference to the same tensor and doesn't recreate it from scratch, if that's what you're asking.
z_sample : dict | ||
Latent variable keys to samples. | ||
""" | ||
# TODO: This appears to be identical across HMC, SGHMC, SGLD. Should it be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that sounds useful to refactor (although i'm not sure where the best location is to put it)
from edward.models import Empirical, MultivariateNormalFull | ||
|
||
# Helper functions | ||
# TODO: Where should these go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's awesome. maybe we can leave it here for now and think about refactoring it in another PR that focuses more on general visualizations for edward
Figured out how to retrieve MCMC trace more easily. Clean up normal_sghmc - Make indent 2 spaces instead of 4 - Get rid of TODO's. Clean up sghmc.py Get rid of TODO's.
Cleaned things up a bit. Got rid of the README notes since we've now discussed them. Squashed into previous commit, which I now know is a bad thing to do because our discussions are now "outdated". Lesson learned. Let me know if there are other things you'd like changed on this PR. Otherwise, I'll start thinking about the other things we discussed: step size, code viz, and refactor of _log_joint. |
looks good to me. can you add an example with data? e.g., linear/logistic regression is fine. |
Will do. |
Git style question: I'm want to update this branch with the latest changes from master, which contains relevant bug fixes for #412 . I'm also going to add my integer type fix. In general, this can be done by merging master into this branch, or rebasing the branch onto master. The problem with the former approach is it will make the history messy, while doing the latter makes it impossible to view changes that were made before the rebase (like what's happened above). Which do you prefer? |
i personally prefer to rebase to keep the commit history clean. that said, we're always squashing+merging pull requests, so feel free to do either. |
Cool, rebase works for me. |
i ran the examples in tensorflow 0.11.0 with python 2 and 3. i also double-checked upon merging that they work in tensorflow 1.0.0rc0. merging now. |
I'm working with @nfoti and @yianma. We have some questions on a few small parts of the code. See comments in changed files.