-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Plumb --samples_per_plugin to Rust data server #4689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks excellent. Apart from the error handling for the parser, I’m
quite happy with this; nice work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready for another look
tensorboard/data/server/run.rs
Outdated
if let Some(ref pd) = metadata.plugin_data { | ||
if let Some(hint) = &*plugin_sampling_hint { | ||
if let Some(&num_samples) = hint.0.get(&pd.plugin_name) { | ||
// TODO(psybuzz): if the hint prescribes 0 samples, the reservoir should ideally | ||
// be unbounded. For now, it simply creates a reservoir with capacity 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making the plugin sampling hint required to remove one layer of conditionals?
Sounds good!
What do you think about encapsulating this logic into a method on
PluginSamplingHint
?
Hm, if we extracted the logic into a method that accepted metadata
, it would be able to compute the capacity entirely, so there would be less reason to keep the DataClass
match
around in StageTimeSeries::new
. In that case, PluginSamplingHint::capacity(metadata)
should contain all the capacity logic.
If the concern is that the capacity logic is a bit much to have in the constructor, would it suffice to just move the code into a method StageTimeSeries::capacity
?
aside: also switched the if capacity > 0
to if data_class != Unknown
, since even if a known data class had 0 default capacity, the user hint should still override.
part: pair_str.to_string(), | ||
}); | ||
} | ||
let num_samples = pair[1].parse::<usize>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice: good use of the implicit error type conversion.
tensorboard/data/server/run.rs
Outdated
if let Some(ref pd) = metadata.plugin_data { | ||
if let Some(hint) = &*plugin_sampling_hint { | ||
if let Some(&num_samples) = hint.0.get(&pd.plugin_name) { | ||
// TODO(psybuzz): if the hint prescribes 0 samples, the reservoir should ideally | ||
// be unbounded. For now, it simply creates a reservoir with capacity 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, if we extracted the logic into a method that accepted
metadata
, it would be able to compute the capacity entirely, so there would be less reason to keep theDataClass
match
around inStageTimeSeries::new
. In that case,PluginSamplingHint::capacity(metadata)
should contain all the capacity logic.
Agreed.
If the concern is that the capacity logic is a bit much to have in the constructor, would it suffice to just move the code into a method
StageTimeSeries::capacity
?
I’m not too worried by the number of lines of code in the constructor.
I was wondering whether it makes sense from a conceptual standpoint that
the plugin sampling hint should know how to determine the capacity. For
instance, that would provide a natural place to unit-test that logic if
we so desired.
That said, I don’t feel strongly about it; the question was genuine.
This is fine with me, and I do think that having it in a separate
function (as you’ve done) is nice, because you can return early.
tensorboard/data/server/run.rs
Outdated
data: RunLoaderData { | ||
plugin_sampling_hint, | ||
..RunLoaderData::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that it would make sense to remove the Default
implementation and instead provide RunLoaderData::new
that requires
an Arc<PluginSamplingHint>
? No strong preference, and happy to
defer to you; just curious for your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this, I realize I have no strong preference, but weakly think dropping the Default
is a good idea.
Since RunLoaderData
passes the hint to StageTimeSeries::new
, which requires it, it's a bit more obvious that a default Arc<PluginSamplingHint>
might not be appropriate for the caller. Introducing a ::new()
with only the hint parameter also looks more consistent with, say, LogdirLoader::new()
.
Done.
tensorboard/data/server_ingester.py
Outdated
args = [ | ||
server_binary, | ||
"--logdir=%s" % (self._logdir,), | ||
"--reload=%s" % reload, | ||
"--port=0", | ||
"--port-file=%s" % (port_file_path,), | ||
"--die-after-stdin", | ||
"--samples-per-plugin=%s" % samples_per_plugin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m noticing that while this works fine with the data server at head, it
will fail for the v0.3.0 data server (no such flag), which is what’s
currently released. What do you think about either:
-
only including this flag when
samples_per_plugin
is non-empty, as
you had previously? then--load_fast
would still be broken with
v0.3.0 with explicit--samples_per_plugin
, but at least that’s not
in the default path; or -
explicitly gating the flag on
tensorboard_data_server.__version__
?
you could usepkg_resources.parse_version
to ensure that it’s at
least0.4.0a0
(cf.version_test.py
)
I appreciate that this kind of dependency constrant is less than ideal,
and can think about ways to improve it in the general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only including this flag when samples_per_plugin is non-empty, as you had previously?
Done.
Agreed that it's non ideal, but hopefully the --undefok
Clap may go through before we do the next release of tensorboard-data-server
.
Explicitly gating the flag on version # would work for us, but I'm not sure what the story would be for users writing their own custom data server using it via TENSORBOARD_DATA_SERVER_BINARY
.
Besides --undefok
, the only thing I have thought of is trying to detect support for the flag by running <binary> --help
and parsing the output to see whether --samples-per-plugin
is there. Which is also very non-ideal!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly gating the flag on version # would work for us, but I'm not sure what the story would be for users writing their own custom data server using it via
TENSORBOARD_DATA_SERVER_BINARY
.
Ah, good point, yes. You could invoke rustboard --version
and see what
it prints, which is not too hard but certainly less convenient.
Besides
--undefok
, the only thing I have thought of is trying to detect support for the flag by running<binary> --help
and parsing the output to see whether--samples-per-plugin
is there. Which is also very non-ideal!
Also a good point. Certainly not ideal, and only really works for
feature additions (e.g., would be harder to detect if we add support for
something like --samples-per-plugin=scalars=100,images=all
).
But I don’t mind it that much if we want to go that route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
tensorboard/data/server/run.rs
Outdated
data: RunLoaderData { | ||
plugin_sampling_hint, | ||
..RunLoaderData::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this, I realize I have no strong preference, but weakly think dropping the Default
is a good idea.
Since RunLoaderData
passes the hint to StageTimeSeries::new
, which requires it, it's a bit more obvious that a default Arc<PluginSamplingHint>
might not be appropriate for the caller. Introducing a ::new()
with only the hint parameter also looks more consistent with, say, LogdirLoader::new()
.
Done.
tensorboard/data/server_ingester.py
Outdated
args = [ | ||
server_binary, | ||
"--logdir=%s" % (self._logdir,), | ||
"--reload=%s" % reload, | ||
"--port=0", | ||
"--port-file=%s" % (port_file_path,), | ||
"--die-after-stdin", | ||
"--samples-per-plugin=%s" % samples_per_plugin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only including this flag when samples_per_plugin is non-empty, as you had previously?
Done.
Agreed that it's non ideal, but hopefully the --undefok
Clap may go through before we do the next release of tensorboard-data-server
.
Explicitly gating the flag on version # would work for us, but I'm not sure what the story would be for users writing their own custom data server using it via TENSORBOARD_DATA_SERVER_BINARY
.
Besides --undefok
, the only thing I have thought of is trying to detect support for the flag by running <binary> --help
and parsing the output to see whether --samples-per-plugin
is there. Which is also very non-ideal!
tensorboard/data/server/run.rs
Outdated
if let Some(ref pd) = metadata.plugin_data { | ||
if let Some(hint) = &*plugin_sampling_hint { | ||
if let Some(&num_samples) = hint.0.get(&pd.plugin_name) { | ||
// TODO(psybuzz): if the hint prescribes 0 samples, the reservoir should ideally | ||
// be unbounded. For now, it simply creates a reservoir with capacity 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. In that case, keeping the logic here in StageTimeSeries
still seems preferable than moving to PluginSamplingHint::capacity
imo, so I'll keep as-is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; thanks!
tensorboard/data/server_ingester.py
Outdated
args = [ | ||
server_binary, | ||
"--logdir=%s" % (self._logdir,), | ||
"--reload=%s" % reload, | ||
"--port=0", | ||
"--port-file=%s" % (port_file_path,), | ||
"--die-after-stdin", | ||
"--samples-per-plugin=%s" % samples_per_plugin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly gating the flag on version # would work for us, but I'm not sure what the story would be for users writing their own custom data server using it via
TENSORBOARD_DATA_SERVER_BINARY
.
Ah, good point, yes. You could invoke rustboard --version
and see what
it prints, which is not too hard but certainly less convenient.
Besides
--undefok
, the only thing I have thought of is trying to detect support for the flag by running<binary> --help
and parsing the output to see whether--samples-per-plugin
is there. Which is also very non-ideal!
Also a good point. Certainly not ideal, and only really works for
feature additions (e.g., would be harder to detect if we add support for
something like --samples-per-plugin=scalars=100,images=all
).
But I don’t mind it that much if we want to go that route.
Summary: In #4689, we decided to only pass `--samples-per-plugin` to the data server conditionally, because at the time the latest public release of the data server did not support that flag. Since then, v0.4.0 has been published, so we can pass the flag unconditionally. Test Plan: Verified that with latest `tb-nightly`’s dependencies, the server behaves as expected both with and without `--samples_per_plugin`. wchargin-branch: data-always-pass-samples wchargin-source: 04f7394f218d395db4366dceeed9b5dfb0d1dcef
Summary: In #4689, we decided to only pass `--samples-per-plugin` to the data server conditionally, because at the time the latest public release of the data server did not support that flag. Since then, v0.4.0 has been published, so we can pass the flag unconditionally. Test Plan: Verified that with latest `tb-nightly`’s dependencies, the server behaves as expected both with and without `--samples_per_plugin`. wchargin-branch: data-always-pass-samples
Summary: Under `--load_fast=auto`, if the data server exits without writing a port file, we now gracefully degrade to the multiplexer reading path. This subsumes and expands upon the checks from #4786, with semantics now defined by the data server as in #PARENT. This depends on the `--error-file` data server flag added in #PARENT, which is not yet released. To handle this, we add a mechanism for gating flags against data server versions. As [discussed in #4689][1], this raises a concern of what to do when the data server is chosen by a mechanism other than the Python package. We resolve this by simply treating such data servers as bleeding-edge. If you link in a data server yourself, you should use it with a copy of `tb-nightly` built from the same Git commit, or your mileage may vary. See also #4786, which first proposed this functionality and offered some alternative mechanisms. [1]: #4689 (comment) Test Plan: Set your `GOOGLE_APPLICATION_CREDENTIALS` to an invalid JSON file, and try running with `--load_fast=auto` with release v0.4.0 of the data server and also with latest head (`//tensorboard/data/server:install`). Note with `--verbosity 0` that with the old server, `--error-file` is not passed and the data server has to fall back to anonymous creds, whereas with the new server, `--error-file` is passed and TensorBoard correctly falls back to the multiplexer. wchargin-branch: data-server-fallback wchargin-source: ac77fa64fc6b10b6dc80f9962ee291e8084dba80
Summary: Under `--load_fast=auto`, if the data server exits without writing a port file, we now gracefully degrade to the multiplexer reading path. This subsumes and expands upon the checks from #4786, with semantics now defined by the data server as in #4793. This depends on the `--error-file` data server flag added in #4793, which is not yet released. To handle this, we add a mechanism for gating flags against data server versions. As [discussed in #4689][1], this raises a concern of what to do when the data server is chosen by a mechanism other than the Python package. We resolve this by simply treating such data servers as bleeding-edge. If you link in a data server yourself, you should use it with a copy of `tb-nightly` built from the same Git commit, or your mileage may vary. See also #4786, which first proposed this functionality and offered some alternative mechanisms. [1]: #4689 (comment) Test Plan: Set your `GOOGLE_APPLICATION_CREDENTIALS` to an invalid JSON file, and try running with `--load_fast=auto` with release v0.4.0 of the data server and also with latest head (`//tensorboard/data/server:install`). Note with `--verbosity 0` that with the old server, `--error-file` is not passed and the data server has to fall back to anonymous creds, whereas with the new server, `--error-file` is passed and TensorBoard correctly falls back to the multiplexer. wchargin-branch: data-server-fallback
Introduces a
--samples-per-plugin
flag on the Rust data server, whichtakes the
--samples_per_plugin
flag from standard TensorBoard's CLI.At the moment, this change makes the value of
0
interpreted as"create a reservoir with capacity 0", while traditionally the value would be
interpreted as "create a reservoir with unbounded capacity". This
discrepancy will be fixed in a separate PR.
Test plan:

Added unit tests and ran with
cargo test
.Ran
bazel run tensorboard:dev --define=link_data_server=true -- --load_fast --logdir <logdir> --bind_all --samples_per_plugin=scalars=5,images=0
and observed that scalar charts that normally show lots of points now only show 5, while images do not appear at all.Associated issue: "samples per plugin" item in the Rustboard task list #4422