[webui] Scalability fixes for the task timeline and visualizations #935

ericl · 2017-09-06T02:21:54Z

This PR ensures web ui responsiveness (#933) by adding a hard cap on the number of tasks returned (10000).

You can test with the following snippet which launches one million tasks locally. The UI should respond fluidly in all the visualizations except the timeline, which still should update within about 10 seconds.

import time
import ray

@ray.remote
def f(x):
    return x

print("Launching tasks")

ray.init()
for x in range(1000000):
    y = f.remote(x)
ray.get(y)
print("Done")

time.sleep(99999999)

cc @alanamarzoev @robertnishihara

ericl · 2017-09-06T02:22:33Z

python/ray/experimental/state.py

@@ -562,8 +571,10 @@ def micros(ts):
        def micros_rel(ts):
            return micros(ts - start_time)

-        task_profiles = self.task_profiles(start=0, end=time.time())


This was performing an unnecessary full table scan.

ericl · 2017-09-06T02:22:41Z

python/ray/experimental/state.py

@@ -562,8 +571,10 @@ def micros(ts):
        def micros_rel(ts):
            return micros(ts - start_time)

-        task_profiles = self.task_profiles(start=0, end=time.time())
-        task_table = self.task_table()


This was also a full table scan. The replacement is slower when the number of tasks is small, but has a bounded worst case latency.

There is actually a way to get the best of both worlds with the SCAN [1] class of functions from redis (I guess that's what you mean with the TODO above, do you want to make it a little more precise?)

[1] https://redis.io/commands/scan

SCAN is still O(n), but MGET is bounded by the limit. I think the refactoring is a bit nontrivial but I updated the TODO.

You are right, sounds good!

ericl · 2017-09-06T02:23:17Z

python/ray/experimental/state.py

-                                                       ["ParentTaskID"]]
-                                          ["get_arguments_start"]),
+                        "ts": micros_rel(
+                            parent_profile and


With truncation it's possible we don't have some parent profiles. We are just ignoring them for now.

pcmoritz · 2017-09-06T02:39:05Z

python/ray/experimental/state.py

@@ -419,17 +422,25 @@ def task_profiles(self, start=None, end=None, num_tasks=None, fwd=True):
        task_info = dict()
        event_log_sets = self.redis_client.keys("event_log*")

+        if num_tasks is None:


This code should not be part of the GlobalState API and rather num_tasks should be passed in from the webui

AmplabJenkins · 2017-09-06T02:41:31Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-09-06T02:41:32Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1782/
Test PASSed.

AmplabJenkins · 2017-09-06T03:10:34Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-09-06T03:10:34Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1783/
Test PASSed.

ericl · 2017-09-08T08:41:26Z

Looks like these are broken tests, I'll take a look tomorrow.

AmplabJenkins · 2017-09-08T17:40:44Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-09-08T17:40:44Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1796/
Test PASSed.

AmplabJenkins · 2017-09-09T04:16:25Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-09-09T04:16:26Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1800/
Test PASSed.

ericl · 2017-09-10T05:20:53Z

@pcmoritz looks like tests are passing. One of the travis builds was cancelled but I don't see failures.

robertnishihara · 2017-09-10T07:28:18Z

Nice! I'm trying it out now.

robertnishihara · 2017-09-10T07:37:49Z

I think I can generate key errors when I use timelines with task dependencies (the line numbers are sligtly off because I rebased the PR locally).

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/Workspace/ray/python/ray/experimental/ui.py in handle_submit(sender)
    442                                              breakdowns=breakdown,
    443                                              obj_dep=obj_dep.value,
--> 444                                              task_dep=task_dep.value)
    445         print("Opening html file in browser...")
    446 

~/Workspace/ray/python/ray/experimental/state.py in dump_catapult_trace(self, path, task_info, breakdowns, task_dep, obj_dep)
    748                             owner_task = self._object_table(arg)["TaskID"]
    749                             owner_worker = (workers[
--> 750                                 task_info[owner_task]["worker_id"]])
    751                             # Adding/subtracting 2 to the time associated with
    752                             # the beginning/ending of the flow event is

KeyError: 'be338b34b87c7497d6fd82768b5f3e3d5f4442c3'

E.g.,

import ray
ray.init()

@ray.remote
def f(x):
    return 1

x = 1
for _ in range(10 ** 5):
    x = f.remote(x)

Then visualize the last 10000 tasks.

Presumably this arises from omitting the initial scan of the entire task table.

ericl · 2017-09-10T19:37:48Z

Alright that should be fixed.

robertnishihara · 2017-09-10T19:58:29Z

I don't see any new commits, did you push something?

ericl · 2017-09-10T20:04:15Z

Pushed

…

On Sun, Sep 10, 2017, 12:58 PM Robert Nishihara ***@***.***> wrote: I don't see any new commits, did you push something? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#935 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA6SmbjUvbYjjhP6Sf-lO472lNPY65Lks5shD9mgaJpZM4PNwN0> .

…calability-fixes

robertnishihara · 2017-09-10T20:17:44Z

Looks good to me, I'll merge it once the tests run.

AmplabJenkins · 2017-09-10T20:25:36Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-09-10T20:25:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1813/
Test PASSed.

AmplabJenkins · 2017-09-10T20:34:47Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-09-10T20:34:48Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1814/
Test PASSed.

fixes

fc53f97

ericl commented Sep 6, 2017

View reviewed changes

pcmoritz reviewed Sep 6, 2017

View reviewed changes

comments

f441220

fix test

b55756d

Update ui.py

31cf9fb

upd

a13d921

ericl and others added 2 commits September 10, 2017 13:04

Merge branch 'ui-scalability-fixes' of github.com:ericl/ray into ui-s…

875f28c

…calability-fixes

Fix linting.

9dff5d0

robertnishihara merged commit d8aa826 into ray-project:master Sep 10, 2017

robertnishihara deleted the ui-scalability-fixes branch September 10, 2017 22:47

robertnishihara mentioned this pull request Sep 10, 2017

[webui] Consider putting a hard cap on the number of tasks that the UI can visualize simultaneously. #933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[webui] Scalability fixes for the task timeline and visualizations #935

[webui] Scalability fixes for the task timeline and visualizations #935

ericl commented Sep 6, 2017 •

edited

Loading

ericl Sep 6, 2017 •

edited

Loading

ericl Sep 6, 2017 •

edited

Loading

pcmoritz Sep 6, 2017

ericl Sep 6, 2017 •

edited

Loading

pcmoritz Sep 6, 2017

ericl Sep 6, 2017

pcmoritz Sep 6, 2017

ericl Sep 6, 2017

AmplabJenkins commented Sep 6, 2017

AmplabJenkins commented Sep 6, 2017

AmplabJenkins commented Sep 6, 2017

AmplabJenkins commented Sep 6, 2017

ericl commented Sep 8, 2017

AmplabJenkins commented Sep 8, 2017

AmplabJenkins commented Sep 8, 2017

AmplabJenkins commented Sep 9, 2017

AmplabJenkins commented Sep 9, 2017

ericl commented Sep 10, 2017 •

edited

Loading

robertnishihara commented Sep 10, 2017

robertnishihara commented Sep 10, 2017

ericl commented Sep 10, 2017

robertnishihara commented Sep 10, 2017

ericl commented Sep 10, 2017 via email

robertnishihara commented Sep 10, 2017

AmplabJenkins commented Sep 10, 2017

AmplabJenkins commented Sep 10, 2017

AmplabJenkins commented Sep 10, 2017

AmplabJenkins commented Sep 10, 2017

[webui] Scalability fixes for the task timeline and visualizations #935

[webui] Scalability fixes for the task timeline and visualizations #935

Conversation

ericl commented Sep 6, 2017 • edited Loading

ericl Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

ericl Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

pcmoritz Sep 6, 2017

Choose a reason for hiding this comment

ericl Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

pcmoritz Sep 6, 2017

Choose a reason for hiding this comment

ericl Sep 6, 2017

Choose a reason for hiding this comment

pcmoritz Sep 6, 2017

Choose a reason for hiding this comment

ericl Sep 6, 2017

Choose a reason for hiding this comment

AmplabJenkins commented Sep 6, 2017

AmplabJenkins commented Sep 6, 2017

AmplabJenkins commented Sep 6, 2017

AmplabJenkins commented Sep 6, 2017

ericl commented Sep 8, 2017

AmplabJenkins commented Sep 8, 2017

AmplabJenkins commented Sep 8, 2017

AmplabJenkins commented Sep 9, 2017

AmplabJenkins commented Sep 9, 2017

ericl commented Sep 10, 2017 • edited Loading

robertnishihara commented Sep 10, 2017

robertnishihara commented Sep 10, 2017

ericl commented Sep 10, 2017

robertnishihara commented Sep 10, 2017

ericl commented Sep 10, 2017 via email

robertnishihara commented Sep 10, 2017

AmplabJenkins commented Sep 10, 2017

AmplabJenkins commented Sep 10, 2017

AmplabJenkins commented Sep 10, 2017

AmplabJenkins commented Sep 10, 2017

ericl commented Sep 6, 2017 •

edited

Loading

ericl Sep 6, 2017 •

edited

Loading

ericl Sep 6, 2017 •

edited

Loading

ericl Sep 6, 2017 •

edited

Loading

ericl commented Sep 10, 2017 •

edited

Loading