-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added images are broken and then discarded #4889
Comments
When I let it run for a few hours with many images being generated, I noticed that there is still only a few of them accessible. And that's how I found out that there is a maximum limit for images. After running tensorboard with It would be great if the existence of such a limit is shown somewhere in the board, because the behaviour seems very strange if one doesn't know what's happening. Ideally this limit could be configured in the UI while it's running, otherwise it would be nice if the configured value is shown in the board, e.g. in the settings or somewhere. It was really confusing (for me at least). |
Thanks for the report. There seem to be two issues here.
|
This is would likely be because there are two phases to loading images on the dashboard: an XHR at dashboard load time populates the set of steps that are currently available, but then the image itself isn't fetched until you slide the step slider to that step. So when the event files are still growing and the reservoir is already full, adding additional images can result in images that exist on the frontend to be sampled away on the backend, leaving the image broken. In that sense it's a known limitation of how the image dashboard is structured (and would affect the audio dashboard too). There are a few ways one could fix this: A) Keep a cache of all images that were ever sent to the frontend as a possible step, and retain those despite the reservoir sampling. But this implies possibly unbounded memory growth which mostly defeats the point of reservoir sampling. Passing B) Disallow the selection of a step that doesn't have an image available to serve from the backend. This could look something like changing the image card to send a request for "nearest image to requested step X that hasn't been sampled away" and then having it render that image, but then update the step marker to show the actual retrieved step of the image, rather than the one that the slider showed before (basically, we would synchronize the tick marks of the slider after fetching the image). C) Change the backend to just keep pointers to the image data in the event files, rather than actually loading the images themselves into memory. We send the frontend encoded versions of these pointers, which it feeds back into the request to fetch the image (this is how the DataProvider blob keys already work, essentially), and then decode the pointer and read the image bytes directly from the event file. This still breaks if the event file was actually deleted, but it has the nice property that it reduces overall memory usage (and thus probably lets us keep more image samples by default) in addition to fixing this issue. cc @wchargin who I know thought a bit about the last issue (but I can't find a link to it right now) |
Environment information (required)
Diagnostics
Diagnostics output
Issue description
When I add images to tensorboard, many of them are shown as broken like this:
Upon reloading, these images are simply discarded, and the slider skips those steps. Here is a MWE:
Edit: Looking into the browser console, I can see lots of messages like this:
Failed to load resource: the server responded with a status of 404 (NOT FOUND)
The text was updated successfully, but these errors were encountered: