Plumb --samples_per_plugin to Rust data server #4689

psybuzz · 2021-02-18T10:25:18Z

Introduces a --samples-per-plugin flag on the Rust data server, which
takes the --samples_per_plugin flag from standard TensorBoard's CLI.
At the moment, this change makes the value of 0 interpreted as
"create a reservoir with capacity 0", while traditionally the value would be
interpreted as "create a reservoir with unbounded capacity". This
discrepancy will be fixed in a separate PR.

Test plan:
Added unit tests and ran with cargo test.
Ran bazel run tensorboard:dev --define=link_data_server=true -- --load_fast --logdir <logdir> --bind_all --samples_per_plugin=scalars=5,images=0 and observed that scalar charts that normally show lots of points now only show 5, while images do not appear at all.

Associated issue: "samples per plugin" item in the Rustboard task list #4422

wchargin

This looks excellent. Apart from the error handling for the parser, I’m
quite happy with this; nice work.

tensorboard/data/server/cli.rs

tensorboard/data/server/run.rs

tensorboard/data/server/types.rs

tensorboard/data/server/cli.rs

tensorboard/data/server/run.rs

psybuzz

Ready for another look

tensorboard/data/server/cli.rs

tensorboard/data/server/run.rs

tensorboard/data/server/types.rs

tensorboard/data/server/run.rs

tensorboard/data/server/types.rs

psybuzz · 2021-02-19T23:04:46Z

tensorboard/data/server/run.rs

+            if let Some(ref pd) = metadata.plugin_data {
+                if let Some(hint) = &*plugin_sampling_hint {
+                    if let Some(&num_samples) = hint.0.get(&pd.plugin_name) {
+                        // TODO(psybuzz): if the hint prescribes 0 samples, the reservoir should ideally
+                        // be unbounded. For now, it simply creates a reservoir with capacity 0.


What do you think about making the plugin sampling hint required to remove one layer of conditionals?

Sounds good!

What do you think about encapsulating this logic into a method on PluginSamplingHint?

Hm, if we extracted the logic into a method that accepted metadata, it would be able to compute the capacity entirely, so there would be less reason to keep the DataClass match around in StageTimeSeries::new. In that case, PluginSamplingHint::capacity(metadata) should contain all the capacity logic.

If the concern is that the capacity logic is a bit much to have in the constructor, would it suffice to just move the code into a method StageTimeSeries::capacity?

aside: also switched the if capacity > 0 to if data_class != Unknown, since even if a known data class had 0 default capacity, the user hint should still override.

wchargin · 2021-02-20T00:44:34Z

tensorboard/data/server/types.rs

+                    part: pair_str.to_string(),
+                });
+            }
+            let num_samples = pair[1].parse::<usize>()?;


Nice: good use of the implicit error type conversion.

tensorboard/data/server/types.rs

wchargin · 2021-02-20T01:00:17Z

tensorboard/data/server/run.rs

+            if let Some(ref pd) = metadata.plugin_data {
+                if let Some(hint) = &*plugin_sampling_hint {
+                    if let Some(&num_samples) = hint.0.get(&pd.plugin_name) {
+                        // TODO(psybuzz): if the hint prescribes 0 samples, the reservoir should ideally
+                        // be unbounded. For now, it simply creates a reservoir with capacity 0.


Hm, if we extracted the logic into a method that accepted metadata, it would be able to compute the capacity entirely, so there would be less reason to keep the DataClass match around in StageTimeSeries::new. In that case, PluginSamplingHint::capacity(metadata) should contain all the capacity logic.

Agreed.

If the concern is that the capacity logic is a bit much to have in the constructor, would it suffice to just move the code into a method StageTimeSeries::capacity?

I’m not too worried by the number of lines of code in the constructor.
I was wondering whether it makes sense from a conceptual standpoint that
the plugin sampling hint should know how to determine the capacity. For
instance, that would provide a natural place to unit-test that logic if
we so desired.

That said, I don’t feel strongly about it; the question was genuine.
This is fine with me, and I do think that having it in a separate
function (as you’ve done) is nice, because you can return early.

wchargin · 2021-02-20T01:04:05Z

tensorboard/data/server/run.rs

+            data: RunLoaderData {
+                plugin_sampling_hint,
+                ..RunLoaderData::default()


Do you think that it would make sense to remove the Default
implementation and instead provide RunLoaderData::new that requires
an Arc<PluginSamplingHint>? No strong preference, and happy to
defer to you; just curious for your thoughts.

After thinking about this, I realize I have no strong preference, but weakly think dropping the Default is a good idea.

Since RunLoaderData passes the hint to StageTimeSeries::new, which requires it, it's a bit more obvious that a default Arc<PluginSamplingHint> might not be appropriate for the caller. Introducing a ::new() with only the hint parameter also looks more consistent with, say, LogdirLoader::new().

Done.

wchargin · 2021-02-20T01:09:07Z

tensorboard/data/server_ingester.py

        args = [
            server_binary,
            "--logdir=%s" % (self._logdir,),
            "--reload=%s" % reload,
            "--port=0",
            "--port-file=%s" % (port_file_path,),
            "--die-after-stdin",
+            "--samples-per-plugin=%s" % samples_per_plugin,


I’m noticing that while this works fine with the data server at head, it
will fail for the v0.3.0 data server (no such flag), which is what’s
currently released. What do you think about either:

only including this flag when samples_per_plugin is non-empty, as
you had previously? then --load_fast would still be broken with
v0.3.0 with explicit --samples_per_plugin, but at least that’s not
in the default path; or

explicitly gating the flag on tensorboard_data_server.__version__?
you could use pkg_resources.parse_version to ensure that it’s at
least 0.4.0a0 (cf. version_test.py)

I appreciate that this kind of dependency constrant is less than ideal,
and can think about ways to improve it in the general case.

only including this flag when samples_per_plugin is non-empty, as you had previously?

Done.

Agreed that it's non ideal, but hopefully the --undefok Clap may go through before we do the next release of tensorboard-data-server.

Explicitly gating the flag on version # would work for us, but I'm not sure what the story would be for users writing their own custom data server using it via TENSORBOARD_DATA_SERVER_BINARY.

Besides --undefok, the only thing I have thought of is trying to detect support for the flag by running <binary> --help and parsing the output to see whether --samples-per-plugin is there. Which is also very non-ideal!

Explicitly gating the flag on version # would work for us, but I'm not sure what the story would be for users writing their own custom data server using it via TENSORBOARD_DATA_SERVER_BINARY.

Ah, good point, yes. You could invoke rustboard --version and see what
it prints, which is not too hard but certainly less convenient.

Besides --undefok, the only thing I have thought of is trying to detect support for the flag by running <binary> --help and parsing the output to see whether --samples-per-plugin is there. Which is also very non-ideal!

Also a good point. Certainly not ideal, and only really works for
feature additions (e.g., would be harder to detect if we add support for
something like --samples-per-plugin=scalars=100,images=all).
But I don’t mind it that much if we want to go that route.

psybuzz

Thanks for the review!

psybuzz · 2021-02-22T19:56:34Z

tensorboard/data/server/run.rs

+            data: RunLoaderData {
+                plugin_sampling_hint,
+                ..RunLoaderData::default()


After thinking about this, I realize I have no strong preference, but weakly think dropping the Default is a good idea.

Since RunLoaderData passes the hint to StageTimeSeries::new, which requires it, it's a bit more obvious that a default Arc<PluginSamplingHint> might not be appropriate for the caller. Introducing a ::new() with only the hint parameter also looks more consistent with, say, LogdirLoader::new().

Done.

psybuzz · 2021-02-22T20:55:33Z

tensorboard/data/server_ingester.py

        args = [
            server_binary,
            "--logdir=%s" % (self._logdir,),
            "--reload=%s" % reload,
            "--port=0",
            "--port-file=%s" % (port_file_path,),
            "--die-after-stdin",
+            "--samples-per-plugin=%s" % samples_per_plugin,


only including this flag when samples_per_plugin is non-empty, as you had previously?

Done.

Agreed that it's non ideal, but hopefully the --undefok Clap may go through before we do the next release of tensorboard-data-server.

Explicitly gating the flag on version # would work for us, but I'm not sure what the story would be for users writing their own custom data server using it via TENSORBOARD_DATA_SERVER_BINARY.

Besides --undefok, the only thing I have thought of is trying to detect support for the flag by running <binary> --help and parsing the output to see whether --samples-per-plugin is there. Which is also very non-ideal!

psybuzz · 2021-02-22T20:57:02Z

tensorboard/data/server/run.rs

+            if let Some(ref pd) = metadata.plugin_data {
+                if let Some(hint) = &*plugin_sampling_hint {
+                    if let Some(&num_samples) = hint.0.get(&pd.plugin_name) {
+                        // TODO(psybuzz): if the hint prescribes 0 samples, the reservoir should ideally
+                        // be unbounded. For now, it simply creates a reservoir with capacity 0.


Gotcha. In that case, keeping the logic here in StageTimeSeries still seems preferable than moving to PluginSamplingHint::capacity imo, so I'll keep as-is for now.

wchargin

LGTM; thanks!

wchargin · 2021-02-22T21:09:35Z

tensorboard/data/server_ingester.py

        args = [
            server_binary,
            "--logdir=%s" % (self._logdir,),
            "--reload=%s" % reload,
            "--port=0",
            "--port-file=%s" % (port_file_path,),
            "--die-after-stdin",
+            "--samples-per-plugin=%s" % samples_per_plugin,


Explicitly gating the flag on version # would work for us, but I'm not sure what the story would be for users writing their own custom data server using it via TENSORBOARD_DATA_SERVER_BINARY.

Ah, good point, yes. You could invoke rustboard --version and see what
it prints, which is not too hard but certainly less convenient.

Besides --undefok, the only thing I have thought of is trying to detect support for the flag by running <binary> --help and parsing the output to see whether --samples-per-plugin is there. Which is also very non-ideal!

Also a good point. Certainly not ideal, and only really works for
feature additions (e.g., would be harder to detect if we add support for
something like --samples-per-plugin=scalars=100,images=all).
But I don’t mind it that much if we want to go that route.

Summary: Nothing critical: some unnecessary `.into()`s from #4677 and some unnecessary `&Box<T>` double-indirection from #4689. wchargin-branch: rust-clippy-fixes-20210225 wchargin-source: 27bd0fee8a91ed809d78b07bbcaf60dd0c276fef

Summary: Nothing critical: some unnecessary `.into()`s from #4677 and some unnecessary `&Box<T>` double-indirection from #4689. wchargin-branch: rust-clippy-fixes-20210225

Summary: In #4689, we decided to only pass `--samples-per-plugin` to the data server conditionally, because at the time the latest public release of the data server did not support that flag. Since then, v0.4.0 has been published, so we can pass the flag unconditionally. Test Plan: Verified that with latest `tb-nightly`’s dependencies, the server behaves as expected both with and without `--samples_per_plugin`. wchargin-branch: data-always-pass-samples wchargin-source: 04f7394f218d395db4366dceeed9b5dfb0d1dcef

Summary: In #4689, we decided to only pass `--samples-per-plugin` to the data server conditionally, because at the time the latest public release of the data server did not support that flag. Since then, v0.4.0 has been published, so we can pass the flag unconditionally. Test Plan: Verified that with latest `tb-nightly`’s dependencies, the server behaves as expected both with and without `--samples_per_plugin`. wchargin-branch: data-always-pass-samples

Summary: Under `--load_fast=auto`, if the data server exits without writing a port file, we now gracefully degrade to the multiplexer reading path. This subsumes and expands upon the checks from #4786, with semantics now defined by the data server as in #PARENT. This depends on the `--error-file` data server flag added in #PARENT, which is not yet released. To handle this, we add a mechanism for gating flags against data server versions. As [discussed in #4689][1], this raises a concern of what to do when the data server is chosen by a mechanism other than the Python package. We resolve this by simply treating such data servers as bleeding-edge. If you link in a data server yourself, you should use it with a copy of `tb-nightly` built from the same Git commit, or your mileage may vary. See also #4786, which first proposed this functionality and offered some alternative mechanisms. [1]: #4689 (comment) Test Plan: Set your `GOOGLE_APPLICATION_CREDENTIALS` to an invalid JSON file, and try running with `--load_fast=auto` with release v0.4.0 of the data server and also with latest head (`//tensorboard/data/server:install`). Note with `--verbosity 0` that with the old server, `--error-file` is not passed and the data server has to fall back to anonymous creds, whereas with the new server, `--error-file` is passed and TensorBoard correctly falls back to the multiplexer. wchargin-branch: data-server-fallback wchargin-source: ac77fa64fc6b10b6dc80f9962ee291e8084dba80

Summary: Under `--load_fast=auto`, if the data server exits without writing a port file, we now gracefully degrade to the multiplexer reading path. This subsumes and expands upon the checks from #4786, with semantics now defined by the data server as in #4793. This depends on the `--error-file` data server flag added in #4793, which is not yet released. To handle this, we add a mechanism for gating flags against data server versions. As [discussed in #4689][1], this raises a concern of what to do when the data server is chosen by a mechanism other than the Python package. We resolve this by simply treating such data servers as bleeding-edge. If you link in a data server yourself, you should use it with a copy of `tb-nightly` built from the same Git commit, or your mileage may vary. See also #4786, which first proposed this functionality and offered some alternative mechanisms. [1]: #4689 (comment) Test Plan: Set your `GOOGLE_APPLICATION_CREDENTIALS` to an invalid JSON file, and try running with `--load_fast=auto` with release v0.4.0 of the data server and also with latest head (`//tensorboard/data/server:install`). Note with `--verbosity 0` that with the old server, `--error-file` is not passed and the data server has to fall back to anonymous creds, whereas with the new server, `--error-file` is passed and TensorBoard correctly falls back to the multiplexer. wchargin-branch: data-server-fallback

psybuzz added the core:rustboard //tensorboard/data/server/... label Feb 18, 2021

google-cla bot added the cla: yes label Feb 18, 2021

Plumb --samples_per_plugin to Rust data server.

b39af49

psybuzz force-pushed the rust-spp branch from e4e4eb4 to b39af49 Compare February 18, 2021 10:34

psybuzz marked this pull request as ready for review February 18, 2021 10:36

py format

fd31dec

psybuzz requested a review from wchargin February 18, 2021 10:38

psybuzz changed the title ~~Plumb --samples_per_plugin to Rust data server.~~ Plumb --samples_per_plugin to Rust data server Feb 18, 2021

wchargin reviewed Feb 18, 2021

View reviewed changes

wchargin mentioned this pull request Feb 19, 2021

RustBoard tasks #4422

Closed

34 tasks

ac

d39cf11

psybuzz commented Feb 19, 2021

View reviewed changes

ah fix ingester test

a159fc8

wchargin approved these changes Feb 20, 2021

View reviewed changes

wchargin mentioned this pull request Feb 20, 2021

Allow end users to indicate that specific arguments may be ignored if not recognized clap-rs/clap#2354

Closed

2 tasks

ac2

905b214

psybuzz commented Feb 22, 2021

View reviewed changes

fix server ingester test again

fbf5521

wchargin approved these changes Feb 22, 2021

View reviewed changes

psybuzz merged commit 319db49 into tensorflow:master Feb 22, 2021

wchargin mentioned this pull request Feb 25, 2021

rust: fix a few Clippy warnings #4712

Merged

wchargin added a commit that referenced this pull request Feb 26, 2021

rust: fix a few Clippy warnings (#4712)

fa1474a

Summary: Nothing critical: some unnecessary `.into()`s from #4677 and some unnecessary `&Box<T>` double-indirection from #4689. wchargin-branch: rust-clippy-fixes-20210225

wchargin mentioned this pull request Mar 18, 2021

data: pass --samples-per-plugin unconditionally #4790

Merged

wchargin mentioned this pull request Mar 18, 2021

program: fall back from data server to multiplexer #4794

Merged

epage mentioned this pull request Dec 6, 2021

Allow end users to indicate that specific arguments may be ignored if not recognized epage/clapng#179

Open

2 tasks

Plumb --samples_per_plugin to Rust data server #4689

Plumb --samples_per_plugin to Rust data server #4689

Uh oh!

Conversation

psybuzz commented Feb 18, 2021

Uh oh!

wchargin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

psybuzz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psybuzz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wchargin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!