-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[train] New persistence mode: Remove some legacy air.Checkpoint
dependencies
#39049
Merged
matthewdeng
merged 14 commits into
ray-project:master
from
justinvyu:test_experiment_restore
Aug 31, 2023
Merged
[train] New persistence mode: Remove some legacy air.Checkpoint
dependencies
#39049
matthewdeng
merged 14 commits into
ray-project:master
from
justinvyu:test_experiment_restore
Aug 31, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
test_experiment_restore
from_dict
remainders
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
…_experiment_restore
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
…_experiment_restore
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
from_dict
remaindersair.Checkpoint
dependencies
matthewdeng
approved these changes
Aug 31, 2023
arvind-chandra
pushed a commit
to lmco/ray
that referenced
this pull request
Aug 31, 2023
…endencies (ray-project#39049) Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
matthewdeng
pushed a commit
to matthewdeng/ray
that referenced
this pull request
Sep 1, 2023
…endencies (ray-project#39049) Signed-off-by: Justin Yu <justinvyu@anyscale.com>
GeneDer
pushed a commit
that referenced
this pull request
Sep 1, 2023
…39195) * [CI] Remove tags for AIR and AIR smoke test (#39075) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> * [train] New persistence mode: Finish migrating `Tune tests + examples (small)` (#39047) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [train] Rename `train(config)` to `train_fn(config)` (#39065) Our API change from `tune.report()` to `train.report()` can lead to namespace clashes when the training function is called `train`. This has been an issue in many test migrations, including #39050 and the current failure of #38493. This PR does a global replace of all training function defined as `def train(config)` with `def train_fn(config)` to avoid future clashes. Signed-off-by: Kai Fricke <kai@anyscale.com> * [Data/Train] [Docs] Re-organize data loading performance tips (#39096) Re-organize data loading performance tips. We want the caching and the CPU nodes sections to be together since they are both addressing the same problems of optimizing performance when you have expensive CPU preprocessing, and the latter references the former. Signed-off-by: Amog Kamsetty <amogkam@users.noreply.github.com> * [air] Hard deprecate PredictorDeployment and PredictorWrapper (#39108) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * Update the DeepSpeed and Accelerate doc example with new Checkpoint API. (#39014) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> * [train] New persistence mode: Remove some legacy `air.Checkpoint` dependencies (#39049) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [train] Fix wandb/comet integration API calls (#38978) Removes remaining calls to checkpoint.dir_or_data in the wandb/comet integrations Signed-off-by: Kai Fricke <kai@anyscale.com> * [tune] Deprecate `tune.report`, `tune.checkpoint_dir`, `checkpoint_dir`, and `reporter` (#39093) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [2.7][Example] Enable new APIs for Lightning `dolly-v2-7b` Fine-tuning Example (#39117) Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * [train] New persistence mode: Re-enable py37 compatibility tests (#39121) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [Ray 2.7 Examples][1/n] Revamp the LightningTrainer CoLA Example (#38009) Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * [train] New persistence mode: Support `chdir_to_trial_dir` functionality with `RAY_CHDIR_TO_TRIAL_DIR` env var (#39107) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [train] New persistence mode: Minimal `BackendExecutor` cleanup (#39187) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [train/rllib] RLlib GPU storage context tests (#39166) Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * [docs][train] Update Train landing and Overview pages (#38808) Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> --------- Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Amog Kamsetty <amogkam@users.noreply.github.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Yunxuan Xiao <yunxuanx@anyscale.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
LeonLuttenberger
pushed a commit
to jaidisido/ray
that referenced
this pull request
Sep 5, 2023
…endencies (ray-project#39049) Signed-off-by: Justin Yu <justinvyu@anyscale.com>
jimthompson5802
pushed a commit
to jimthompson5802/ray
that referenced
this pull request
Sep 12, 2023
…endencies (ray-project#39049) Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>
vymao
pushed a commit
to vymao/ray
that referenced
this pull request
Oct 11, 2023
…endencies (ray-project#39049) Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
This PR removes the
encode_data
anddecode_data
logic inBackend
. These are not needed with the newtrain.Checkpoint
, because there's no dict data to encode/decode. The use case before was primarily to cast generic checkpoints to framework-specific checkpoints, but we don't need that anymore.test_experiment_restore
is passing for some reason, even though the script usesCheckpoint.from_dict
. This PR adds an explicit progress assertion and migrates the train script.This PR also updates some leftover
Checkpoint.from_dict
usage in examples.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.