Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air] remove fully_executed from Tune. #25750

Merged
merged 6 commits into from
Jun 15, 2022

Conversation

xwjiang2010
Copy link
Contributor

@xwjiang2010 xwjiang2010 commented Jun 14, 2022

Why are these changes needed?

Remove fully_executed from Tune layer.

I tried using my test bench by adding the following in ProgressReporter (printed every 5 seconds).

from ray.worker import _global_node
if _global_node.address_info:
    from ray.internal.internal_api import memory_summary
    meminfo = memory_summary(
        _global_node.address_info["address"], stats_only=True
    )
    print(meminfo)

And run Tune job with 10 trials (they should all share the same Dataset object).

The console output is a few lines of

Plasma memory usage 152 MiB, 1 objects

And then the output changes to

Plasma memory usage 305 MiB, 2 objects

I cannot quite explain why the usage is doubled, but at least it's not going linear with the size of trials (which is 10). I take it as a proof...

For check-ingest, I intentionally removed the session around preprocessing, for two reasons:

  1. Would like to focus on dataset sharing topic for now, without further complicating the subject
  2. I am not convinced that the current behavior is what we want or what we should advertise. Like even if I do
Trainer(datasets=xxx, preprocessor=yyy)
param_space={"learning_rate": tune.grid_search([0.001, 0.005])}
tuner=zzzz
tuner.fit()

I still have 2 copies of post-processed dataset, which seems very inefficient.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good, but what is the reason to change the docs to such as large extent? I had in mind to tweak the doc slightly only:

  1. Say the for experiment-wide dataset, you need to call ds.fully_executed() so the blocks are materialized before the tuner is created. This enables dataset sharing. Note that if a preprocessor is used, then only the input blocks are shared, preprocessing is still separate.
  2. Otherwise, the dataset will be read separately in each trial.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jun 14, 2022
@xwjiang2010 xwjiang2010 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jun 14, 2022
@ericl ericl merged commit 88d824d into ray-project:master Jun 15, 2022
@xwjiang2010 xwjiang2010 deleted the fully_executed branch July 26, 2023 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants