Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added images are broken and then discarded #4889

Open
ptoews opened this issue Apr 20, 2021 · 3 comments
Open

Added images are broken and then discarded #4889

ptoews opened this issue Apr 20, 2021 · 3 comments
Assignees
Labels
core:frontend plugin:images theme:performance Performance, scalability, large data sizes, slowness, etc. theme:ui-polish Features or fixes that make core UI more pleasant. type:docs type:feature

Comments

@ptoews
Copy link

ptoews commented Apr 20, 2021

Environment information (required)

Diagnostics

Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version e43767ef2b648d0d5d57c00f38ccbd38390e38da

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='philipp', release='5.8.0-50-generic', version='#56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: 'path/to/my/venv'

--- check: installed_packages
INFO: installed: tensorboard==2.4.1
WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview']
WARNING: no installation among: ['tensorflow-estimator', 'tensorflow-estimator-2.0-preview', 'tf-estimator-nightly']

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.4.1'

--- check: tensorflow_python_version
Traceback (most recent call last):
  File "diagnose_tensorboard.py", line 522, in main
    suggestions.extend(check())
  File "diagnose_tensorboard.py", line 75, in wrapper
    result = fn()
  File "diagnose_tensorboard.py", line 278, in tensorflow_python_version
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

--- check: tensorboard_data_server_version
INFO: no data server installed

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/path/to/my/venv/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'philipp'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=4064016, st_dev=66311, st_nlink=2, st_uid=1000, st_gid=1000, st_size=4096, st_atime=1618906884, st_mtime=1618906884, st_ctime=1618906884)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/path/to/my/venv/lib/python3.8/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.12.0
actionlib==1.13.2
angles==1.9.13
argon2-cffi==20.1.0
async-generator==1.10
attrs==20.3.0
backcall==0.2.0
bleach==3.3.0
bondpy==1.8.6
cachetools==4.2.1
catkin==0.8.9
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
cv-bridge==1.15.0
cycler==0.10.0
Cython==0.29.22
DBSCANPP==1.0
decorator==4.4.2
defusedxml==0.7.1
diagnostic-analysis==1.10.3
diagnostic-common-diagnostics==1.10.3
diagnostic-updater==1.10.3
dynamic-reconfigure==1.7.1
entrypoints==0.3
gencpp==0.6.5
geneus==3.0.0
genlisp==0.4.18
genmsg==0.5.16
gennodejs==2.0.2
genpy==0.6.14
google-auth==1.29.0
google-auth-oauthlib==0.4.4
grpcio==1.37.0
h5py==3.2.1
idna==2.10
imageio==2.9.0
interactive-markers==1.12.0
ipykernel==5.5.0
ipython==7.21.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.18.0
Jinja2==2.11.3
joblib==1.0.1
joint-state-publisher==1.15.0
joint-state-publisher-gui==1.15.0
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.3.0
jupyter-contrib-core==0.3.3
jupyter-contrib-nbextensions==0.5.1
jupyter-core==4.7.1
jupyter-highlight-selected-word==0.2.0
jupyter-latex-envs==1.4.6
jupyter-nbextensions-configurator==0.4.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
laser-geometry==1.6.7
lxml==4.6.2
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.4.1
message-filters==1.15.9
mistune==0.8.4
mlxtend==0.18.0
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
networkx==2.5
notebook==6.3.0
numpy==1.20.2
oauthlib==3.1.0
packaging==20.9
pandas==1.2.4
pandocfilters==1.4.3
parso==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.1.2
pip==21.0.1
pkg-resources==0.0.0
plotly==4.14.3
prometheus-client==0.9.0
prompt-toolkit==3.0.17
protobuf==3.15.8
ptyprocess==0.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
Pygments==2.8.1
PyOpenGL==3.1.5
pyparsing==2.4.7
PyQt5==5.15.4
PyQt5-Qt5==5.15.2
PyQt5-sip==12.8.1
PyQt5-stubs==5.15.2.0
pyqtgraph==0.12.1
pyrsistent==0.17.3
python-dateutil==2.8.1
python-qt-binding==0.4.3
pytz==2021.1
PyWavelets==1.1.1
PyYAML==5.4.1
pyzmq==22.0.3
qt-dotgraph==0.4.2
qt-gui==0.4.2
qt-gui-cpp==0.4.2
qt-gui-py-common==0.4.2
qtconsole==5.0.3
QtPy==1.9.0
requests==2.25.1
requests-oauthlib==1.3.0
resource-retriever==1.12.6
retrying==1.3.3
rosbag==1.15.9
rosboost-cfg==1.15.7
rosclean==1.15.7
roscreate==1.15.7
rosgraph==1.15.9
roslaunch==1.15.9
roslib==1.15.7
roslint==0.12.0
roslz4==1.15.9
rosmake==1.15.7
rosmaster==1.15.9
rosmsg==1.15.9
rosnode==1.15.9
rosparam==1.15.9
rospy==1.15.9
rosservice==1.15.9
rostest==1.15.9
rostopic==1.15.9
rosunit==1.15.7
roswtf==1.15.9
rqt-action==0.4.9
rqt-bag==0.5.1
rqt-bag-plugins==0.5.1
rqt-console==0.4.11
rqt-dep==0.4.10
rqt-graph==0.4.14
rqt-gui==0.5.2
rqt-gui-py==0.5.2
rqt-image-view==0.4.16
rqt-launch==0.4.9
rqt-logger-level==0.4.11
rqt-moveit==0.5.9
rqt-msg==0.4.9
rqt-nav-view==0.5.7
rqt-plot==0.4.13
rqt-pose-view==0.5.10
rqt-publisher==0.4.9
rqt-py-common==0.5.2
rqt-py-console==0.4.9
rqt-reconfigure==0.5.3
rqt-robot-dashboard==0.5.8
rqt-robot-monitor==0.5.13
rqt-robot-steering==0.5.12
rqt-runtime-monitor==0.5.8
rqt-rviz==0.6.1
rqt-service-caller==0.4.9
rqt-shell==0.4.10
rqt-srv==0.4.8
rqt-tf-tree==0.6.2
rqt-top==0.4.9
rqt-topic==0.4.12
rqt-web==0.4.9
rsa==4.7.2
rviz==1.14.5
scikit-image==0.18.1
scikit-learn==0.24.1
scipy==1.6.1
Send2Trash==1.5.0
sensor-msgs==1.13.1
setuptools==54.1.1
six==1.15.0
sklearn==0.0
smach==2.5.0
smach-ros==2.5.0
smclib==1.8.6
tensorboard==2.4.1
tensorboard-plugin-wit==1.8.0
terminado==0.9.2
testpath==0.4.4
tf==1.13.2
tf-conversions==1.13.2
tf2-geometry-msgs==0.7.5
tf2-kdl==0.7.5
tf2-py==0.7.5
tf2-ros==0.7.5
threadpoolctl==2.1.0
tifffile==2021.3.5
topic-tools==1.15.9
torch==1.8.1+cu111
torchaudio==0.8.1
torchvision==0.9.1+cu111
tornado==6.1
traitlets==5.0.5
typing-extensions==3.7.4.3
urllib3==1.26.4
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
wheel==0.36.2
widgetsnbextension==3.5.1
xacro==1.14.6

Issue description

When I add images to tensorboard, many of them are shown as broken like this:
image
Upon reloading, these images are simply discarded, and the slider skips those steps. Here is a MWE:

import time

import torch
import torch.utils.tensorboard as tensorboard


writer = tensorboard.SummaryWriter("runs/" + str(time.time()))
for i in range(100):
    writer.add_image(f"test", torch.rand((400, 32, 3)), global_step=i, dataformats="WHC")
    time.sleep(1)

Edit: Looking into the browser console, I can see lots of messages like this:

Failed to load resource: the server responded with a status of 404 (NOT FOUND)

@arghyaganguly arghyaganguly self-assigned this Apr 21, 2021
@arghyaganguly arghyaganguly added core:frontend theme:performance Performance, scalability, large data sizes, slowness, etc. type:support labels Apr 21, 2021
@ptoews
Copy link
Author

ptoews commented Apr 21, 2021

When I let it run for a few hours with many images being generated, I noticed that there is still only a few of them accessible. And that's how I found out that there is a maximum limit for images. After running tensorboard with --samples_per_plugin=images=500 everything is working now.

It would be great if the existence of such a limit is shown somewhere in the board, because the behaviour seems very strange if one doesn't know what's happening. Ideally this limit could be configured in the UI while it's running, otherwise it would be nice if the configured value is shown in the board, e.g. in the settings or somewhere. It was really confusing (for me at least).

@arghyaganguly arghyaganguly assigned bmd3k and unassigned arghyaganguly Apr 21, 2021
@arghyaganguly arghyaganguly added theme:ui-polish Features or fixes that make core UI more pleasant. type:docs type:feature and removed type:support labels Apr 21, 2021
@bmd3k
Copy link
Contributor

bmd3k commented Apr 21, 2021

Thanks for the report. There seem to be two issues here.

  1. Some images are skipped/discarded: Yes, as you determined, the plugin only shows a sample of the images. The sample size is static so the sample set changes as the dataset grows larger. I can discuss with the team whether we should surface this information in the UI.

  2. Some images are not being served (404 errors): This is surprising to me. I can try to reproduce. In the meantime, do you know if your tensorboard server was logging any relevant error messages to terminal/console/shell?

@nfelt
Copy link
Contributor

nfelt commented Apr 21, 2021

  1. Some images are not being served (404 errors): This is surprising to me. I can try to reproduce. In the meantime, do you know if your tensorboard server was logging any relevant error messages to terminal/console/shell?

This is would likely be because there are two phases to loading images on the dashboard: an XHR at dashboard load time populates the set of steps that are currently available, but then the image itself isn't fetched until you slide the step slider to that step. So when the event files are still growing and the reservoir is already full, adding additional images can result in images that exist on the frontend to be sampled away on the backend, leaving the image broken. In that sense it's a known limitation of how the image dashboard is structured (and would affect the audio dashboard too).

There are a few ways one could fix this:

A) Keep a cache of all images that were ever sent to the frontend as a possible step, and retain those despite the reservoir sampling. But this implies possibly unbounded memory growth which mostly defeats the point of reservoir sampling. Passing --samples_per_plugin=images=0 (for unlimited sample size) is one workaround to achieve the same effect, with the same downside.

B) Disallow the selection of a step that doesn't have an image available to serve from the backend. This could look something like changing the image card to send a request for "nearest image to requested step X that hasn't been sampled away" and then having it render that image, but then update the step marker to show the actual retrieved step of the image, rather than the one that the slider showed before (basically, we would synchronize the tick marks of the slider after fetching the image).

C) Change the backend to just keep pointers to the image data in the event files, rather than actually loading the images themselves into memory. We send the frontend encoded versions of these pointers, which it feeds back into the request to fetch the image (this is how the DataProvider blob keys already work, essentially), and then decode the pointer and read the image bytes directly from the event file. This still breaks if the event file was actually deleted, but it has the nice property that it reduces overall memory usage (and thus probably lets us keep more image samples by default) in addition to fixing this issue.

cc @wchargin who I know thought a bit about the last issue (but I can't find a link to it right now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core:frontend plugin:images theme:performance Performance, scalability, large data sizes, slowness, etc. theme:ui-polish Features or fixes that make core UI more pleasant. type:docs type:feature
Projects
None yet
Development

No branches or pull requests

4 participants