Skip to content

[tune] Cannot restore checkpoint for experiment #4714

@pengzhenghao

Description

@pengzhenghao

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Ray installed from (source or binary):
  • Ray version: 0.7.0.dev2
  • Python version:
  • Exact command to reproduce:

Describe the problem

Passing restore=$HOME/exp/.../checkpoint-100 into tune.run(), the checkpoint will not be loaded and ray will start a brand new experiment.

Source code / logs

In tune.run() function, an experiment are automatically initialized as:

        experiment = Experiment(
            name, run_or_experiment, stop, config, resources_per_trial,
            num_samples, local_dir, upload_dir, trial_name_creator, loggers,
            sync_function, checkpoint_freq, checkpoint_at_end, export_formats,
            max_failures, restore)

However, the arguments passed into the Experiment.__init__() have wrong order as:

    def __init__(self,
                 name,
                 run,
                 stop=None,
                 config=None,
                 resources_per_trial=None,
                 num_samples=1,
                 local_dir=None,
                 upload_dir=None,
                 trial_name_creator=None,
                 loggers=None,
                 sync_function=None,
                 checkpoint_freq=0,
                 checkpoint_at_end=False,
                 keep_checkpoints_num=None,
                 checkpoint_score_attr=None,
                 export_formats=None,
                 max_failures=3,
                 restore=None,
                 repeat=None,
                 trial_resources=None,
                 custom_loggers=None):

You can see that this is because the introduction of two new arguments:

keep_checkpoints_num=None,
checkpoint_score_attr=None,

I believe this is caused by the #4490 . I will try to fix this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions