-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[train] Fix wandb/comet integration API calls #38978
Conversation
Signed-off-by: Kai Fricke <kai@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this makes sense to me! Had one nit to avoid needing to update the ray.train._checkpoint
import.
Signed-off-by: Kai Fricke <kai@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good after this fix:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Removes remaining calls to checkpoint.dir_or_data in the wandb/comet integrations Signed-off-by: Kai Fricke <kai@anyscale.com>
…39195) * [CI] Remove tags for AIR and AIR smoke test (#39075) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> * [train] New persistence mode: Finish migrating `Tune tests + examples (small)` (#39047) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [train] Rename `train(config)` to `train_fn(config)` (#39065) Our API change from `tune.report()` to `train.report()` can lead to namespace clashes when the training function is called `train`. This has been an issue in many test migrations, including #39050 and the current failure of #38493. This PR does a global replace of all training function defined as `def train(config)` with `def train_fn(config)` to avoid future clashes. Signed-off-by: Kai Fricke <kai@anyscale.com> * [Data/Train] [Docs] Re-organize data loading performance tips (#39096) Re-organize data loading performance tips. We want the caching and the CPU nodes sections to be together since they are both addressing the same problems of optimizing performance when you have expensive CPU preprocessing, and the latter references the former. Signed-off-by: Amog Kamsetty <amogkam@users.noreply.github.com> * [air] Hard deprecate PredictorDeployment and PredictorWrapper (#39108) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * Update the DeepSpeed and Accelerate doc example with new Checkpoint API. (#39014) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> * [train] New persistence mode: Remove some legacy `air.Checkpoint` dependencies (#39049) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [train] Fix wandb/comet integration API calls (#38978) Removes remaining calls to checkpoint.dir_or_data in the wandb/comet integrations Signed-off-by: Kai Fricke <kai@anyscale.com> * [tune] Deprecate `tune.report`, `tune.checkpoint_dir`, `checkpoint_dir`, and `reporter` (#39093) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [2.7][Example] Enable new APIs for Lightning `dolly-v2-7b` Fine-tuning Example (#39117) Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * [train] New persistence mode: Re-enable py37 compatibility tests (#39121) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [Ray 2.7 Examples][1/n] Revamp the LightningTrainer CoLA Example (#38009) Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * [train] New persistence mode: Support `chdir_to_trial_dir` functionality with `RAY_CHDIR_TO_TRIAL_DIR` env var (#39107) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [train] New persistence mode: Minimal `BackendExecutor` cleanup (#39187) Signed-off-by: Justin Yu <justinvyu@anyscale.com> * [train/rllib] RLlib GPU storage context tests (#39166) Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * [docs][train] Update Train landing and Overview pages (#38808) Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> --------- Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Amog Kamsetty <amogkam@users.noreply.github.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Yunxuan Xiao <yunxuanx@anyscale.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Removes remaining calls to checkpoint.dir_or_data in the wandb/comet integrations Signed-off-by: Kai Fricke <kai@anyscale.com>
Removes remaining calls to checkpoint.dir_or_data in the wandb/comet integrations Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>
Removes remaining calls to checkpoint.dir_or_data in the wandb/comet integrations Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
Removes remaining calls to
checkpoint.dir_or_data
in the wandb/comet integrationsRelated issue number
Closes #38960
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.