Skip to content

Fast data loading feedback (--load_fast=true; “RustBoard”) #4784

Open
@wchargin

Description

@wchargin

This thread is for tracking feedback about TensorBoard’s experimental
mode for fast data loading. Typical speedups range from 100× to 400×.

Who should try this: Anyone who’s found TensorBoard’s data loading
to be slower than they’d like.

Who shouldn’t try this: Windows users (for now).

Feedback: Feedback form, or reply on this thread.

Try it out

To try this out, please uninstall all copies of TensorBoard and then
install the latest version of tb-nightly:

pip uninstall -y tensorboard tb-nightly &&
pip install tb-nightly  # must have at least tb-nightly==2.5.0a20210316

Then, invoke TensorBoard with the --load_fast=true flag:

tensorboard --logdir /path/to/logs --load_fast true

Use TensorBoard as you usually would. It should work the same way, just
faster.

Feedback

You can respond to this anonymous Google Form, or reply on this
thread, or open a new issue. Let us know: did it work? how much faster
was it? any suggestions or requests?

Known issues

We know about these, but please let us know if they matter for you, so
that we can prioritize working on them:

  • Windows is not supported out of the box.
  • Some third-party plugins may need to be updated to work with this
    mode (e.g., the profile plugin).

FAQ

What does “data loading” include?

It includes time spent reading files in your logdir. It does not include
time spent painting charts on the frontend.

What is the --load_fast flag?

Pass --load_fast=true to tell TensorBoard to use a new data loading
mechanism, which is generally hundreds of times faster.

Is --load_fast=true right for me?

Currently, this mode is supported on Linux and macOS. If you are
interested in using it on other platforms, ping @wchargin and I’ll show
you how to build it.

Most features of TensorBoard are expected to work with the new data
loading mechanism. All standard TensorBoard dashboards (scalars, images,
etc.) should work, and flags like --reload_interval should work, too.
You can use logdirs on local disk or on GCS buckets (public or private).

Do I need to have TensorFlow installed?

No.

What’s happening under the hood?

Instead of crawling your logdir in a mixture of Python and C++ code with
a lot of locking, cross-language marshalling, and slow data manipulation
in Python, we read the data in a dedicated subprocess. This program is
written in Rust and is optimized for concurrent reading and serving.
More design details here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions