Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorBoard Integration with Notebook Example #653

Merged
merged 26 commits into from
May 30, 2017

Conversation

closedLoop
Copy link
Contributor

Integration of TensorBoard with Notebook Example

@dustinvtran here's a quick demo of the changes
https://www.useloom.com/share/29e087b17cfc47b9b60662bff7843983

Contents:

  • Modification of inference/inference.py to more completely initialize Tensorboard
  • A Notebook with instructions and best practices to make your TensorBoard clean and useful

Main Features:

  • Automatically tracks all named latent variables for your given inference
    • This can be disabled or modified by passing in a custom list via logvars
  • Works for all inference classes.
  • Automatic run versioning to keep your logs clean and comparable in TensorBoard
  • Instructions for people to easily configure and launch their model with TensorBoard

Issues addressed

Not addressed:

debug : bool, optional
If True, add checks for ``NaN`` and ``Inf`` to all computations
in the graph. May result in substantially slower execution
times.
"""
from datetime import datetime
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move import statements to the top of the file?

@@ -169,11 +171,25 @@ def initialize(self, n_iter=1000, n_print=None, scale=None, logdir=None,
logdir : str, optional
Directory where event file will be written. For details,
see ``tf.summary.FileWriter``. Default is to write nothing.
logrun : str, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe your reasoning for splitting logdir vs logrun? I think I see it, although I wonder if logdir could just be made more powerful by letting you include a custom subdirectory naming. If unspecified, it would default to a subdirecty with the current UTC timestamp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: if logrun can easily be specified as part of logdir that would also enable the timestampping for those who prefer to control inference using initialize() then updates(). (that is, they don't call run())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They way I thought about it is that logrun will tend to be pretty static for a given project and you want a way to cleanly separate info each time you refit your models. Therefor you can leave it blank as you arbitrarily refine the model and use the timestamps to track your progress. Else you could set logrun to something like using_adam vs using_adagrad to highlight in TensorBoard the convergence results of each.

I wanted to keep logdir a string since it is a string in TensorBoard, and given it has the same name I wanted to make the usage similar to avoid confusion.

Copy link
Member

@dustinvtran dustinvtran May 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can already separate info each time they refit a model by tweaking the logdir argument. A typical use case would be

inference.initialize(logdir='log/lr_{}_nsamples_{}'.format(learning_rate_stepsize, n_samples))

This is how native TensorFlowers write the logdir argument in their tf.summary.FileWriter. Users have full flexibility in nesting however many subdirectories and filenames they want. logdir + logrun sort of imposes redundancy, where a user can specify logrun already inside logdir.

That said, I love the idea of having a default subdirectory appended to logdir which timestamps the run. It seems impossible to turn it on/off without having an additional argument. So then maybe logrun should just be a boolean, and its name+default be changed to something like log_timestamp=True?


if logrun is None:
# Set default to timestamp
logrun = datetime.strftime(datetime.utcnow(), "%Y%m%dT%H%M%S")
Copy link
Member

@dustinvtran dustinvtran May 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good standard! What's the T?

Also, do you think adding _ inbetween would make the directory name too long? (Probably.) Is there a standard other code bases have done regarding this which we might follow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The T was a means of just separating the date and the time segments like most of the conventions here https://www.loggly.com/docs/automated-parsing/#system and also removing all of the other non-numeric characters from the conventions. However after using this in tensorboard a bit. I'm going to replace the 'T' with a '_' just to keep it easier to read on the eye.

@@ -258,3 +284,48 @@ def finalize(self):
"""
if self.logging:
self.train_writer.close()

def set_log_variables(self, logvars=None, log_max_scalers_per_var=None):
"""Logs variables to TensorBoard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensorboard.

"""Logs variables to TensorBoard

For each variable in logvars, creates 'scalar' and / or 'histogram' by
calling `tf.summary.scalar` or `tf.summary.histogram`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To format code, wrap double single ticks, i.e., \``tf.summary.scalar``\ (without the \). Also needs a period at end of sentence.

if logvars is None:
logvars = []
for k in self.latent_vars:
logvars += get_variables(self.latent_vars[k])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables can also exist as model parameters. I recommend an additional loop applying this to keys of self.data and the priors (which are keys of self.latent_vars).

@@ -1 +1 @@
__version__ = '1.3.1'
__version__ = '1.3.1-closedloop'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this :)

@@ -50,6 +50,7 @@ def mvn_plot_contours(z, label=False, ax=None):
qz = Empirical(params=tf.Variable(tf.random_normal([5000, 2])))

inference = ed.SGHMC({z: qz})
inference.initialize(logdir='C:\\Users\skruz\\temp\\tensorflow\\')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this argument to be more generic, say, logdir='log'.

@dustinvtran
Copy link
Member

dustinvtran commented May 28, 2017

This is a great PR @closedLoop. It's a valuable contribution in making Edward fits easier to explore and also to diagnose model/learning problems. And thanks for the demo!

I added comments above. Two others

  • Can you remove changes from the various notebooks unrelated to this PR?
  • I recommend displaying the TensorBoard pics in the tutorial. You can do that by using something like
![GAN Fig 0](https://raw.githubusercontent.com/blei-lab/edward/master/docs/images/gan-fig0.png)

Then you can explain them as you do in the video demo.


# If var is multi-dimensional, log the distribution
if len(var.shape) > 0 and np.max(var.shape) > 1:
tf.summary.histogram(var_name, var)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this conflicts with the current tensorboard tracking in variational_inference.py (see #654). My guess is you can remove the parameter histograms in variational_inference.py.

I also recommend checking to see how the naming systems match.

@closedLoop
Copy link
Contributor Author

@dustinvtran this has been cleaned up integrated with the gradient and loss tracking from the merged pull request #598 . I think it's looking really good and I've tested it on some larger models and it works well there too.

Main changes:

  • parameter naming is consistent with logging in variational_inference.py
  • logrun has been changed to log_timestamp boolean based on your suggestion.
  • I've removed log_max_scalars_per_var from my previous addition. It was unnecessarily complicated and was a solution in search of a problem.
  • Single valued Variables are graphed in Scalars else they are logged to Histogram and Distributions in TensorBoard.
  • The tensorboard.ipynb has been updated and includes references to the gradient tracking and loss logged that was added in tensorboard for variational inference #598

Thank you for all of your suggestions. I believe this is much cleaner now. I'm happy to modify anything else you'd like

Here's the new screenshots!
tensorboard_scalars

tensorboard_histograms

@dustinvtran
Copy link
Member

Great work. Merging now. I'll add a separate PR that adds the equivalent notebook as a tutorial on the website.

@dustinvtran dustinvtran merged commit a17d650 into blei-lab:master May 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants