Skip to content

Commit

Permalink
[tune] Added possibility to execute infinite recovery retries for a t…
Browse files Browse the repository at this point in the history
…rial (#3901)

Allows to let a trial try to do infinite recoveries by setting _max_failures_ to a negative number.
  • Loading branch information
SieversLeon authored and richardliaw committed Jan 31, 2019
1 parent beb7519 commit d3551dd
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
3 changes: 2 additions & 1 deletion python/ray/tune/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ class Experiment(object):
experiment regardless of the checkpoint_freq. Default is False.
max_failures (int): Try to recover a trial from its last
checkpoint at least this many times. Only applies if
checkpointing is enabled. Defaults to 3.
checkpointing is enabled. Setting to -1 will lead to infinite
recovery retries. Defaults to 3.
restore (str): Path to checkpoint. Only makes sense to set if
running 1 trial. Defaults to None.
repeat: Deprecated and will be removed in future versions of
Expand Down
3 changes: 2 additions & 1 deletion python/ray/tune/trial.py
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,8 @@ def should_recover(self):
be a checkpoint.
"""
return (self.checkpoint_freq > 0
and self.num_failures < self.max_failures)
and (self.num_failures < self.max_failures
or self.max_failures < 0))

def update_last_result(self, result, terminate=False):
if terminate:
Expand Down

0 comments on commit d3551dd

Please sign in to comment.