diff --git a/doc/source/tune/api_docs/trainable.rst b/doc/source/tune/api_docs/trainable.rst index 68404e6fd346..ef57fccad064 100644 --- a/doc/source/tune/api_docs/trainable.rst +++ b/doc/source/tune/api_docs/trainable.rst @@ -99,7 +99,7 @@ You can save and load checkpoint in Ray Tune in the following manner: tuner = tune.Tuner(train_func) results = tuner.fit() -.. note:: ``checkpoint_freq`` and ``checkpoint_at_end`` will not work with Function API checkpointing. +.. note:: ``checkpoint_frequency`` and ``checkpoint_at_end`` will not work with Function API checkpointing. In this example, checkpoints will be saved by training iteration to ``local_dir/exp_name/trial_name/checkpoint_``. @@ -177,7 +177,7 @@ You can also implement checkpoint/restore using the Trainable Class API: checkpoint_path = os.path.join(tmp_checkpoint_dir, "model.pth") self.model.load_state_dict(torch.load(checkpoint_path)) - tuner = tune.Tuner(MyTrainableClass, run_config=air.RunConfig(checkpoint_config=air.CheckpointConfig(checkpoint_freq=2))) + tuner = tune.Tuner(MyTrainableClass, run_config=air.RunConfig(checkpoint_config=air.CheckpointConfig(checkpoint_frequency=2))) results = tuner.fit() You can checkpoint with three different mechanisms: manually, periodically, and at termination. @@ -197,7 +197,7 @@ This can be especially helpful in spot instances: **Periodic Checkpointing**: periodic checkpointing can be used to provide fault-tolerance for experiments. -This can be enabled by setting ``checkpoint_freq=`` and ``max_failures=`` to checkpoint trials +This can be enabled by setting ``checkpoint_frequency=`` and ``max_failures=`` to checkpoint trials every *N* iterations and recover from up to *M* crashes per trial, e.g.: .. code-block:: python @@ -205,12 +205,12 @@ every *N* iterations and recover from up to *M* crashes per trial, e.g.: tuner = tune.Tuner( my_trainable, run_config=air.RunConfig( - checkpoint_config=air.CheckpointConfig(checkpoint_freq=10), + checkpoint_config=air.CheckpointConfig(checkpoint_frequency=10), failure_config=air.FailureConfig(max_failures=5)) ) results = tuner.fit() -**Checkpointing at Termination**: The checkpoint_freq may not coincide with the exact end of an experiment. +**Checkpointing at Termination**: The checkpoint_frequency may not coincide with the exact end of an experiment. If you want a checkpoint to be created at the end of a trial, you can additionally set the ``checkpoint_at_end=True``: .. code-block:: python @@ -219,7 +219,7 @@ If you want a checkpoint to be created at the end of a trial, you can additional tuner = tune.Tuner( my_trainable, run_config=air.RunConfig( - checkpoint_config=air.CheckpointConfig(checkpoint_freq=10, checkpoint_at_end=True), + checkpoint_config=air.CheckpointConfig(checkpoint_frequency=10, checkpoint_at_end=True), failure_config=air.FailureConfig(max_failures=5)) ) results = tuner.fit()