[tune] Hyperband Max Iter Fix #1620

richardliaw · 2018-02-27T06:40:14Z

What do these changes do?

Changes semantics of max_iter to be more intuitive.

TODO:

Fix tests

Related issue number

AmplabJenkins · 2018-02-27T07:43:26Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3993/
Test PASSed.

AmplabJenkins · 2018-02-27T07:50:08Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/3994/
Test PASSed.

ericl · 2018-02-28T17:24:46Z

python/ray/tune/hyperband.py

        self._r *= self._eta
        self._r = int((self._r))
+        if self._cumul_r + self._r > self._max_t_attr:


AmplabJenkins · 2018-02-28T22:02:19Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4048/
Test PASSed.

ericl · 2018-03-03T21:00:54Z

Test hang looks unrelated, merging.

* 'master' of https://github.com/royf/ray-private: [rllib] Basic regression tests on CartPole (ray-project#1608) [autoscaler] [tune] More doc fixes (ray-project#1560) [tune] Hyperband Max Iter Fix (ray-project#1620)

@richardliaw

This PR fixes two flaws in the current hyperband implementation. #### 1. Bug in the `r` calculation. #1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed. E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed. I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper. #### 2. Stopping of "overstepped" trials. The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever. This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition. Signed-off-by: Kai Fricke <kai@anyscale.com>

@richardliaw

This PR fixes two flaws in the current hyperband implementation. #### 1. Bug in the `r` calculation. ray-project#1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed. E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed. I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper. #### 2. Stopping of "overstepped" trials. The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever. This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition. Signed-off-by: Kai Fricke <kai@anyscale.com>

@richardliaw

This PR fixes two flaws in the current hyperband implementation. #### 1. Bug in the `r` calculation. #1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed. E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed. I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper. #### 2. Stopping of "overstepped" trials. The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever. This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition. Signed-off-by: Kai Fricke <kai@anyscale.com>

@richardliaw

This PR fixes two flaws in the current hyperband implementation. #### 1. Bug in the `r` calculation. ray-project#1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed. E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed. I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper. #### 2. Stopping of "overstepped" trials. The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever. This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition. Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>

@richardliaw

This PR fixes two flaws in the current hyperband implementation. #### 1. Bug in the `r` calculation. ray-project#1620 introduced a minimum constraint in the `r` calculation during successive halving with `r = min(r, max_t - prev_r)`. It's unclear to me where this is coming from (cc @richardliaw), but in my opinion this is flawed. E.g. for `s=1`, `max_t=8` and `eta=2`, we get `r0 = 4`. Then `r = r0 * 2 = 8`. With the current formula, we then get `r1 = min(r, max_t - r0) = min(8, 8-4) = 4`. Thus, `r0 = r1` and the bracket already "finished" after 4 iterations and should terminate all trials. Or in other words, none of the trials in this bracket will ever proceed. I believe the correct fix here is to set `r = min(r, max_t)`. I couldn't find a reference implementation for comparison, but it logically makes sense and seems to match the formula behavior described in the paper. #### 2. Stopping of "overstepped" trials. The first bug revealed a second shortcoming in the current implementation. When a trial reports a timestep that is higher than `r_i` and `r_(i+1)`, it can hang forever. This is because "good" trials are only continued if `bracket.continue_trial(t)` returns `True`. However, if the trial already overstepped `r_(i+1)`, this can return `False` (specifically when `stop_last_trials=True`). In that case, a paused or running trial will not be terminated nor continued, and instead hang forever. This second case is fixed in this PR by introducing another clause in the processing of "good" trials that checks for this condition. Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Victor <vctr.y.m@example.com>

richardliaw added 3 commits February 26, 2018 22:17

nits

1123f67

cumul r

f9d9d08

docs

8b85e7b

richardliaw requested a review from ericl February 27, 2018 08:41

richardliaw assigned ericl Feb 27, 2018

richardliaw mentioned this pull request Feb 27, 2018

[tune] A few differences between the Hyperband implementation and the original paper #1624

Closed

ericl reviewed Feb 28, 2018

View reviewed changes

ericl approved these changes Feb 28, 2018

View reviewed changes

min

ec85408

ericl merged commit 96d7938 into ray-project:master Mar 3, 2018

krfricke mentioned this pull request Aug 31, 2023

[tune] Fix hyperband r calculation and stopping #39157

Merged

8 tasks

krfricke mentioned this pull request Sep 8, 2023

[2.7.0/tune] Fix hyperband r calculation and stopping (#39157) #39451

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tune] Hyperband Max Iter Fix #1620

[tune] Hyperband Max Iter Fix #1620

Uh oh!

richardliaw commented Feb 27, 2018 •

edited

Loading

Uh oh!

AmplabJenkins commented Feb 27, 2018

Uh oh!

AmplabJenkins commented Feb 27, 2018

Uh oh!

ericl Feb 28, 2018

Uh oh!

AmplabJenkins commented Feb 28, 2018

Uh oh!

ericl commented Mar 3, 2018

Uh oh!

Uh oh!

[tune] Hyperband Max Iter Fix #1620

[tune] Hyperband Max Iter Fix #1620

Uh oh!

Conversation

richardliaw commented Feb 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do?

Related issue number

Uh oh!

AmplabJenkins commented Feb 27, 2018

Uh oh!

AmplabJenkins commented Feb 27, 2018

Uh oh!

ericl Feb 28, 2018

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Feb 28, 2018

Uh oh!

ericl commented Mar 3, 2018

Uh oh!

Uh oh!

richardliaw commented Feb 27, 2018 •

edited

Loading