Added images are broken and then discarded #4889

ptoews · 2021-04-20T14:02:39Z

Environment information (required)

Diagnostics

Diagnostics output

--- check: autoidentify
INFO: diagnose_tensorboard.py version e43767ef2b648d0d5d57c00f38ccbd38390e38da

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='philipp', release='5.8.0-50-generic', version='#56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: 'path/to/my/venv'

--- check: installed_packages
INFO: installed: tensorboard==2.4.1
WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview']
WARNING: no installation among: ['tensorflow-estimator', 'tensorflow-estimator-2.0-preview', 'tf-estimator-nightly']

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.4.1'

--- check: tensorflow_python_version
Traceback (most recent call last):
  File "diagnose_tensorboard.py", line 522, in main
    suggestions.extend(check())
  File "diagnose_tensorboard.py", line 75, in wrapper
    result = fn()
  File "diagnose_tensorboard.py", line 278, in tensorflow_python_version
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

--- check: tensorboard_data_server_version
INFO: no data server installed

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/path/to/my/venv/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'philipp'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=4064016, st_dev=66311, st_nlink=2, st_uid=1000, st_gid=1000, st_size=4096, st_atime=1618906884, st_mtime=1618906884, st_ctime=1618906884)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/path/to/my/venv/lib/python3.8/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.12.0
actionlib==1.13.2
angles==1.9.13
argon2-cffi==20.1.0
async-generator==1.10
attrs==20.3.0
backcall==0.2.0
bleach==3.3.0
bondpy==1.8.6
cachetools==4.2.1
catkin==0.8.9
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
cv-bridge==1.15.0
cycler==0.10.0
Cython==0.29.22
DBSCANPP==1.0
decorator==4.4.2
defusedxml==0.7.1
diagnostic-analysis==1.10.3
diagnostic-common-diagnostics==1.10.3
diagnostic-updater==1.10.3
dynamic-reconfigure==1.7.1
entrypoints==0.3
gencpp==0.6.5
geneus==3.0.0
genlisp==0.4.18
genmsg==0.5.16
gennodejs==2.0.2
genpy==0.6.14
google-auth==1.29.0
google-auth-oauthlib==0.4.4
grpcio==1.37.0
h5py==3.2.1
idna==2.10
imageio==2.9.0
interactive-markers==1.12.0
ipykernel==5.5.0
ipython==7.21.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.18.0
Jinja2==2.11.3
joblib==1.0.1
joint-state-publisher==1.15.0
joint-state-publisher-gui==1.15.0
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.3.0
jupyter-contrib-core==0.3.3
jupyter-contrib-nbextensions==0.5.1
jupyter-core==4.7.1
jupyter-highlight-selected-word==0.2.0
jupyter-latex-envs==1.4.6
jupyter-nbextensions-configurator==0.4.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
laser-geometry==1.6.7
lxml==4.6.2
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.4.1
message-filters==1.15.9
mistune==0.8.4
mlxtend==0.18.0
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
networkx==2.5
notebook==6.3.0
numpy==1.20.2
oauthlib==3.1.0
packaging==20.9
pandas==1.2.4
pandocfilters==1.4.3
parso==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.1.2
pip==21.0.1
pkg-resources==0.0.0
plotly==4.14.3
prometheus-client==0.9.0
prompt-toolkit==3.0.17
protobuf==3.15.8
ptyprocess==0.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
Pygments==2.8.1
PyOpenGL==3.1.5
pyparsing==2.4.7
PyQt5==5.15.4
PyQt5-Qt5==5.15.2
PyQt5-sip==12.8.1
PyQt5-stubs==5.15.2.0
pyqtgraph==0.12.1
pyrsistent==0.17.3
python-dateutil==2.8.1
python-qt-binding==0.4.3
pytz==2021.1
PyWavelets==1.1.1
PyYAML==5.4.1
pyzmq==22.0.3
qt-dotgraph==0.4.2
qt-gui==0.4.2
qt-gui-cpp==0.4.2
qt-gui-py-common==0.4.2
qtconsole==5.0.3
QtPy==1.9.0
requests==2.25.1
requests-oauthlib==1.3.0
resource-retriever==1.12.6
retrying==1.3.3
rosbag==1.15.9
rosboost-cfg==1.15.7
rosclean==1.15.7
roscreate==1.15.7
rosgraph==1.15.9
roslaunch==1.15.9
roslib==1.15.7
roslint==0.12.0
roslz4==1.15.9
rosmake==1.15.7
rosmaster==1.15.9
rosmsg==1.15.9
rosnode==1.15.9
rosparam==1.15.9
rospy==1.15.9
rosservice==1.15.9
rostest==1.15.9
rostopic==1.15.9
rosunit==1.15.7
roswtf==1.15.9
rqt-action==0.4.9
rqt-bag==0.5.1
rqt-bag-plugins==0.5.1
rqt-console==0.4.11
rqt-dep==0.4.10
rqt-graph==0.4.14
rqt-gui==0.5.2
rqt-gui-py==0.5.2
rqt-image-view==0.4.16
rqt-launch==0.4.9
rqt-logger-level==0.4.11
rqt-moveit==0.5.9
rqt-msg==0.4.9
rqt-nav-view==0.5.7
rqt-plot==0.4.13
rqt-pose-view==0.5.10
rqt-publisher==0.4.9
rqt-py-common==0.5.2
rqt-py-console==0.4.9
rqt-reconfigure==0.5.3
rqt-robot-dashboard==0.5.8
rqt-robot-monitor==0.5.13
rqt-robot-steering==0.5.12
rqt-runtime-monitor==0.5.8
rqt-rviz==0.6.1
rqt-service-caller==0.4.9
rqt-shell==0.4.10
rqt-srv==0.4.8
rqt-tf-tree==0.6.2
rqt-top==0.4.9
rqt-topic==0.4.12
rqt-web==0.4.9
rsa==4.7.2
rviz==1.14.5
scikit-image==0.18.1
scikit-learn==0.24.1
scipy==1.6.1
Send2Trash==1.5.0
sensor-msgs==1.13.1
setuptools==54.1.1
six==1.15.0
sklearn==0.0
smach==2.5.0
smach-ros==2.5.0
smclib==1.8.6
tensorboard==2.4.1
tensorboard-plugin-wit==1.8.0
terminado==0.9.2
testpath==0.4.4
tf==1.13.2
tf-conversions==1.13.2
tf2-geometry-msgs==0.7.5
tf2-kdl==0.7.5
tf2-py==0.7.5
tf2-ros==0.7.5
threadpoolctl==2.1.0
tifffile==2021.3.5
topic-tools==1.15.9
torch==1.8.1+cu111
torchaudio==0.8.1
torchvision==0.9.1+cu111
tornado==6.1
traitlets==5.0.5
typing-extensions==3.7.4.3
urllib3==1.26.4
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
wheel==0.36.2
widgetsnbextension==3.5.1
xacro==1.14.6

Issue description

When I add images to tensorboard, many of them are shown as broken like this:

Upon reloading, these images are simply discarded, and the slider skips those steps. Here is a MWE:

import time

import torch
import torch.utils.tensorboard as tensorboard


writer = tensorboard.SummaryWriter("runs/" + str(time.time()))
for i in range(100):
    writer.add_image(f"test", torch.rand((400, 32, 3)), global_step=i, dataformats="WHC")
    time.sleep(1)

Edit: Looking into the browser console, I can see lots of messages like this:

Failed to load resource: the server responded with a status of 404 (NOT FOUND)

The text was updated successfully, but these errors were encountered:

ptoews · 2021-04-21T11:58:49Z

When I let it run for a few hours with many images being generated, I noticed that there is still only a few of them accessible. And that's how I found out that there is a maximum limit for images. After running tensorboard with --samples_per_plugin=images=500 everything is working now.

It would be great if the existence of such a limit is shown somewhere in the board, because the behaviour seems very strange if one doesn't know what's happening. Ideally this limit could be configured in the UI while it's running, otherwise it would be nice if the configured value is shown in the board, e.g. in the settings or somewhere. It was really confusing (for me at least).

bmd3k · 2021-04-21T20:00:52Z

Thanks for the report. There seem to be two issues here.

Some images are skipped/discarded: Yes, as you determined, the plugin only shows a sample of the images. The sample size is static so the sample set changes as the dataset grows larger. I can discuss with the team whether we should surface this information in the UI.
Some images are not being served (404 errors): This is surprising to me. I can try to reproduce. In the meantime, do you know if your tensorboard server was logging any relevant error messages to terminal/console/shell?

nfelt · 2021-04-21T20:52:08Z

Some images are not being served (404 errors): This is surprising to me. I can try to reproduce. In the meantime, do you know if your tensorboard server was logging any relevant error messages to terminal/console/shell?

This is would likely be because there are two phases to loading images on the dashboard: an XHR at dashboard load time populates the set of steps that are currently available, but then the image itself isn't fetched until you slide the step slider to that step. So when the event files are still growing and the reservoir is already full, adding additional images can result in images that exist on the frontend to be sampled away on the backend, leaving the image broken. In that sense it's a known limitation of how the image dashboard is structured (and would affect the audio dashboard too).

There are a few ways one could fix this:

A) Keep a cache of all images that were ever sent to the frontend as a possible step, and retain those despite the reservoir sampling. But this implies possibly unbounded memory growth which mostly defeats the point of reservoir sampling. Passing --samples_per_plugin=images=0 (for unlimited sample size) is one workaround to achieve the same effect, with the same downside.

B) Disallow the selection of a step that doesn't have an image available to serve from the backend. This could look something like changing the image card to send a request for "nearest image to requested step X that hasn't been sampled away" and then having it render that image, but then update the step marker to show the actual retrieved step of the image, rather than the one that the slider showed before (basically, we would synchronize the tick marks of the slider after fetching the image).

C) Change the backend to just keep pointers to the image data in the event files, rather than actually loading the images themselves into memory. We send the frontend encoded versions of these pointers, which it feeds back into the request to fetch the image (this is how the DataProvider blob keys already work, essentially), and then decode the pointer and read the image bytes directly from the event file. This still breaks if the event file was actually deleted, but it has the nice property that it reduces overall memory usage (and thus probably lets us keep more image samples by default) in addition to fixing this issue.

cc @wchargin who I know thought a bit about the last issue (but I can't find a link to it right now)

arghyaganguly self-assigned this Apr 21, 2021

arghyaganguly added core:frontend theme:performance Performance, scalability, large data sizes, slowness, etc. type:support labels Apr 21, 2021

arghyaganguly assigned bmd3k and unassigned arghyaganguly Apr 21, 2021

arghyaganguly added theme:ui-polish Features or fixes that make core UI more pleasant. type:docs type:feature and removed type:support labels Apr 21, 2021

nfelt added the plugin:images label Apr 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added images are broken and then discarded #4889

Added images are broken and then discarded #4889

ptoews commented Apr 20, 2021 •

edited

Loading

ptoews commented Apr 21, 2021

bmd3k commented Apr 21, 2021

nfelt commented Apr 21, 2021

Added images are broken and then discarded #4889

Added images are broken and then discarded #4889

Comments

ptoews commented Apr 20, 2021 • edited Loading

Environment information (required)

Diagnostics

Issue description

ptoews commented Apr 21, 2021

bmd3k commented Apr 21, 2021

nfelt commented Apr 21, 2021

ptoews commented Apr 20, 2021 •

edited

Loading