-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tune] Fix trial result fetching #4219
[tune] Fix trial result fetching #4219
Conversation
# the first available result, and we want to guarantee that slower | ||
# trials (i.e. trials that run remotely) also get fairly reported. | ||
# See https://github.com/ray-project/ray/issues/4211 for details. | ||
[result_id], _ = ray.wait(shuffled_results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're probably right. The timeout was only relevant due to the backend bug. Since that is now fixed, the cause of the wait unfairness is likely just the ordering.
@robertnishihara I think one potential fix for this gotcha would be to return results in priority order by completion time, instead of order of ids passed in. I can see a lot of users running into this by accident.
@@ -216,7 +217,13 @@ def get_running_trials(self): | |||
return list(self._running.values()) | |||
|
|||
def get_next_available_trial(self): | |||
[result_id], _ = ray.wait(list(self._running)) | |||
shuffled_results = random.sample( | |||
self._running.keys(), len(self._running)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use random.shuffle() for clarity
Test FAILed. |
facdf64
to
f15ee40
Compare
f15ee40
to
a425faa
Compare
Test FAILed. |
Test FAILed. |
What do these changes do?
Fixes trial result fetching in
RayTrialExecutor
by shuffling the results fetch objects.Related issue number
#4211