Skip to content

Mesh plugin hits KeyError when new mesh summary tags are added to a run #2579

@nfelt

Description

@nfelt

cc @podlipensky

The following steps will recreate the issue against today's tb-nightly:

  1. Create a virtualenv w/ tf-nightly-2.0-preview and the current tb-nightly
  2. Build the mesh_demo_v2 binary as updated in Allow use of mesh_demo_v2 with PLY files lacking vertex colors #2578
  3. wget https://people.sc.fsu.edu/~jburkardt/data/ply/teapot.ply
  4. bazel-bin/tensorboard/plugins/mesh/mesh_demo_v2 --mesh_path=teapot.ply --tag_name=mesh1
  5. tensorboard --logdir /tmp/mesh_demo
  6. Open TensorBoard tab, confirm that tag mesh1 appears as expected in TB
  7. bazel-bin/tensorboard/plugins/mesh/mesh_demo_v2 --mesh_path=teapot.ply --tag_name=mesh2
  8. Reload TensorBoard tab

The mesh visualizations fail to load because the tags request hits a 500 internal server error, due to the handler crashing on a KeyError here:

Traceback (most recent call last):
  File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/werkzeug/serving.py", line 270, in run_wsgi                                                                                                       
    execute(self.server.app)
  File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/werkzeug/serving.py", line 258, in execute                                                                                                        
    application_iter = app(environ, start_response)
  File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/tensorboard/backend/application.py", line 380, in __call__                                                                                        
    return self.exact_routes[clean_path](environ, start_response)
  File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/werkzeug/wrappers.py", line 308, in application                                                                                                   
    resp = f(*args[:-2] + (request,))
  File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 107, in _serve_tags                                                                                
    tag = self._instance_tag_to_tag[(run, instance_tag)]
KeyError: ('.', u'mesh2_FACE')

The problem is that the mesh plugin permanently caches any non-empty result from PluginRunToTagsToContent() here:

def prepare_metadata(self):
"""Processes all tags and caches metadata for each."""
if self._tag_to_instance_tags:
return
# This is a dictionary mapping from run to (tag to string content).
# To be clear, the values of the dictionary are dictionaries.
all_runs = self._multiplexer.PluginRunToTagToContent(MeshPlugin.plugin_name)
# tagToContent is itself a dictionary mapping tag name to string
# SummaryMetadata.plugin_data.content. Retrieve the keys of that dictionary
# to obtain a list of tags associated with each run. For each tag, estimate
# the number of samples.
self._tag_to_instance_tags = collections.defaultdict(list)
self._instance_tag_to_metadata = dict()
for run, tag_to_content in six.iteritems(all_runs):
for tag, content in six.iteritems(tag_to_content):
meta = metadata.parse_plugin_metadata(content)
self._instance_tag_to_metadata[(run, tag)] = meta
# Remember instance_name (instance_tag) for future reference.
self._tag_to_instance_tags[(run, meta.name)].append(tag)
self._instance_tag_to_tag[(run, tag)] = meta.name

Later on in serve_tags() the logic calls PluginRunToTagsToContent() again and then prepare_metadata() but it's a no-op because we already have non-empty metadata cached for the mesh1 tag. And then when we index into the tag dict we get the KeyError.

Right now, the workaround is restarting TensorBoard, since on a fresh load it will correctly cache both tags.

I think a sufficient fix would just be to cache at the granularity of an individual (run, tag) pair; you could have a lookup helper to check the cache and populate it on a cache miss, instead of prepare_metadata(). That might also resolve the issue mentioned in

# TODO(b/128995556): investigate why this additional metadata mapping is
# necessary, it must have something todo with the lifecycle of the request.
# Make sure we populate tags mapping structures.
self.prepare_metadata()

Note that it's also best practice in general not to call into the multiplexer during the plugin construction (as is happening now via the prepare_metadata() call at the end of __init__()), since if this is slow it will delay startup for TensorBoard as a whole.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions