-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorBoard in TFP and TF v2 #356
Comments
We don't have any explicit TB features in TFP, but you should be able to monitor anything you're interested in using tf.summary and friends. You can pass any Is there something in particular you're trying to do? Maybe we can help a bit with idioms. |
Yes, I'm trying to monitor the progress and final results of training a Bayesian NN with HMC. I tried writing a def trace_fn(weights, kernel_results):
print("weights", weights)
print("kernel_results", kernel_results)
@tf.function
def run_hmc(
num_results=100,
num_burnin_steps=0,
step_size=0.01,
current_state=get_initial_state(),
num_steps_between_results=0,
):
hmc_kernel = tfp.mcmc.SimpleStepSizeAdaptation(
tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=joint_log_prob_fn,
num_leapfrog_steps=2,
step_size=step_size,
state_gradients_are_stopped=True,
),
num_adaptation_steps=num_results + num_burnin_steps,
)
weights, kernel_results = tfp.mcmc.sample_chain(
num_results=num_results,
num_burnin_steps=num_burnin_steps,
current_state=current_state,
kernel=hmc_kernel,
trace_fn=trace_fn,
)
print("Acceptance rate:", kernel_results.inner_results.is_accepted.numpy().mean()) but whatever signature I use or action I take in that function, it causes the whole operation to come crashing down. Some docs or guidance on this would be much appreciated! |
Ah yeah, maybe this is a documentation bug -- check the docs on Basically, trace_fn gets to look at the current chain states and "kernel results" structures at each step, and decide which values to create traces of. These traces are what are returned in the
You can also return more complicated nested structures (tuples, named_tuples, dicts [i think...]) from trace_fn. I guess you could also make calls to tf.summary in that function (I'm not sure this will won't badly degrade performance), but you do need to return a valid @SiegeLordEx may have something to add to what I've said. |
What @csuter said is correct. Indeed, if you want to track your weights over time on TensorBoard, you'd place def trace_fn(weights, results):
with tf.compat.v2.summary.record_if(tf.equal(results.step % 100, 0)):
tf.compat.v2.summary.histogram(weights, step=results.step)
return () Note how I set it up to record every 100 steps, for efficiency, but you can do whatever suits your needs. It might also make sense to run |
I don't expect summaries inside the trace fn to work because they sit
inside a while control for context. Summaries must be fetchable at the top
level of the graph. Are you running a chain for so long that you want
summaries out mid execution? For that I think you would want to run
sample_chain for n steps, output a summary, then resume sampling, which
iirc is supported well.
…On Wed, Apr 10, 2019, 12:11 AM Pavel Sountsov ***@***.***> wrote:
What @csuter <https://github.com/csuter> said is correct. Indeed, if you
want to track your weights over time on TensorBoard, you'd place
tf.summary calls inside trace_fn, something like this (untested):
def trace_fn(weights, results):
with tf.compat.v2.summary.record_if(tf.equal(results.step % 100, 0)):
tf.compat.v2.summary.histogram(weights, step=results.step)
return ()
Note how I set it up to record every 100 steps, for efficiency, but you
can do whatever suits your needs.
It might also make sense to run sample_chain without summaries, and then
iterate over the return values of sample_chain (I can imagine this
playing nicer on the GPU), but obviously you'd lose the in-progress display
of your statistics.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#356 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AVJZI19dgud6h8mGMgfNtBc1o7NGS_gxks5vfWRYgaJpZM4cipoZ>
.
|
That's true only of V1 summaries, V2 summaries are just regular ops with a side-effect of writing to a file. Here's a complete working example: import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
dist = tfd.Normal(0., 1.)
kernel = tfp.mcmc.SimpleStepSizeAdaptation(tfp.mcmc.HamiltonianMonteCarlo(dist.log_prob, step_size=0.1, num_leapfrog_steps=3), num_adaptation_steps=100)
summary_writer = tf.compat.v2.summary.create_file_writer('/tmp/summary_chain', flush_millis=10000)
def trace_fn(state, results):
with tf.compat.v2.summary.record_if(tf.equal(results.step % 10, 1)):
tf.compat.v2.summary.scalar("state", state, step=tf.cast(results.step, tf.int64))
return ()
with summary_writer.as_default():
chain, _ = tfp.mcmc.sample_chain(kernel=kernel, current_state=0., num_results=200, trace_fn=trace_fn)
summary_writer.close() There is a bit of an annoyance in that the summaries use the name scope of where they are as the name, which leaks a whole bunch of internal implementation details of sample_chain... I don't have a solution for this yet. |
@SiegeLordEx I found the same thing, creating summaries in
followed by
and an empty TB dashboard. I'm running the latest |
@brianwa84 That's a great suggestion. I'll try that as soon as I have a working implementation. |
@janosh Not sure, my TensorBoard works okay. I'd try things out without TFP, just: summary_writer =
with summary_writer.as_default():
tf.compat.v2.summary.scalar(...)
summary_writer.close() And make sure that works. Maybe it's just some TF2 incompatibility nonsense which has nothing to do with TFP. |
Same problem without |
@brianwa84 What would be the best way of resuming the calculation? Just pass the last state of the previous run into the next one and then concatenate the results of all runs for final diagnostics? E.g. hmc_kernel = tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn, step_size=step_size, num_leapfrog_steps=num_leapfrog_steps
)
adaptive_kernel = tfp.mcmc.SimpleStepSizeAdaptation(
hmc_kernel, num_adaptation_steps=num_adaptation_steps
)
chain1, (_, kernel_results1) = tfp.mcmc.sample_chain(
kernel=adaptive_kernel,
current_state=current_state,
num_results=num_results,
num_steps_between_results=num_steps_between_results,
trace_fn=partial(trace_fn, summary_freq=5),
)
# Some mid-execution diagnostics
chain2, (_, kernel_results2) = tfp.mcmc.sample_chain(
kernel=adaptive_kernel,
current_state=states1[-1],
num_results=num_results,
num_steps_between_results=num_steps_between_results,
trace_fn=partial(trace_fn, summary_freq=5),
)
chain = tf.concat((chain1, chain2), 0) But then how to merge the kernel results |
Something like that: state, kernel_results = tfp.mcmc.sample_chain(
kernel=adaptive_kernel,
current_state=current_state,
num_results=num_results,
num_steps_between_results=num_steps_between_results,
trace_fn=partial(trace_fn, summary_freq=5),
)
chain1, (_, kernel_results1) = state, kernel_results
# Some mid-execution diagnostics
state, kernel_results = tfp.mcmc.sample_chain(
kernel=adaptive_kernel,
current_state=states[-1], # or tf.[compat.v2.]nest.map_structure(lambda x:x[-1], states)
previous_kernel_results=kernel_results, # This line is new.
num_results=num_results,
num_steps_between_results=num_steps_between_results,
trace_fn=partial(trace_fn, summary_freq=5),
)
chain2, (_, kernel_results2) = state, kernel_results
chain = tf.concat((chain1, chain2), 0) |
Re: how to merge the kernel results |
@SiegeLordEx should what I put above work? |
Thanks @brianwa84. Yes, it's something like that. Here's a 'loop' version of the above: kernel_results = kernel.boostrap_results(current_state)
chain_blocks = []
trace_blocks = []
for i in range(num_blocks):
chain, trace, kernel_results = tfp.mcmc.sample_chain(
current_state=current_state,
previous_kernel_results=kernel_results,
trace_fn=...,
return_final_kernel_results=True,
)
# Do your partial analysis here.
current_state = tf.nest.map_structure(lambda x: x[-1])
chain_blocks.append(chain)
trace_blocks.append(trace)
full_chain = tf.nest.map_structure(lambda *parts: tf.concat(parts, axis=0), *chain_blocks)
full_trace = tf.nest.map_structure(lambda *parts: tf.concat(parts, axis=0), *trace_blocks)
# full_trace/full_chain now contain num_blocks * num_results elements |
@SiegeLordEx Why do you need Also, what's the advantage of current_state = tf.nest.map_structure(lambda x: x[-1], chain) over current_state = chain[-1] |
|
About resuming: I had hoped that, when setting random seeds, resuming and running the full chain from the beginning would produce the same results, but it doesn't. Is this expected behavior or am I doing something wrong? Here's a minimal example building on the code that @SiegeLordEx provided (Python 3.6.5; tensorflow==2.3.1; tensorflow-probability==0.11.1):
So the two chains produce the same samples up to step three (as they must since I set a random seed), but produce different samples after resuming. Is there a way to make these two produce equivalent results by setting some internal seeds? Appreciating every feedback :) |
There don't appear to be any docs on how to use TensorBoard with TensorFlow Probability. I'm specifically interested in a guide for the 2.0 release. Is this planned or am I missing something?
The text was updated successfully, but these errors were encountered: