-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[webui] Scalability fixes for the task timeline and visualizations #935
[webui] Scalability fixes for the task timeline and visualizations #935
Conversation
@@ -562,8 +571,10 @@ def micros(ts): | |||
def micros_rel(ts): | |||
return micros(ts - start_time) | |||
|
|||
task_profiles = self.task_profiles(start=0, end=time.time()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was performing an unnecessary full table scan.
@@ -562,8 +571,10 @@ def micros(ts): | |||
def micros_rel(ts): | |||
return micros(ts - start_time) | |||
|
|||
task_profiles = self.task_profiles(start=0, end=time.time()) | |||
task_table = self.task_table() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was also a full table scan. The replacement is slower when the number of tasks is small, but has a bounded worst case latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually a way to get the best of both worlds with the SCAN [1] class of functions from redis (I guess that's what you mean with the TODO above, do you want to make it a little more precise?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCAN is still O(n), but MGET is bounded by the limit. I think the refactoring is a bit nontrivial but I updated the TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, sounds good!
["ParentTaskID"]] | ||
["get_arguments_start"]), | ||
"ts": micros_rel( | ||
parent_profile and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With truncation it's possible we don't have some parent profiles. We are just ignoring them for now.
python/ray/experimental/state.py
Outdated
@@ -419,17 +422,25 @@ def task_profiles(self, start=None, end=None, num_tasks=None, fwd=True): | |||
task_info = dict() | |||
event_log_sets = self.redis_client.keys("event_log*") | |||
|
|||
if num_tasks is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code should not be part of the GlobalState API and rather num_tasks should be passed in from the webui
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Looks like these are broken tests, I'll take a look tomorrow. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
@pcmoritz looks like tests are passing. One of the travis builds was cancelled but I don't see failures. |
Nice! I'm trying it out now. |
I think I can generate key errors when I use timelines with task dependencies (the line numbers are sligtly off because I rebased the PR locally). ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/Workspace/ray/python/ray/experimental/ui.py in handle_submit(sender)
442 breakdowns=breakdown,
443 obj_dep=obj_dep.value,
--> 444 task_dep=task_dep.value)
445 print("Opening html file in browser...")
446
~/Workspace/ray/python/ray/experimental/state.py in dump_catapult_trace(self, path, task_info, breakdowns, task_dep, obj_dep)
748 owner_task = self._object_table(arg)["TaskID"]
749 owner_worker = (workers[
--> 750 task_info[owner_task]["worker_id"]])
751 # Adding/subtracting 2 to the time associated with
752 # the beginning/ending of the flow event is
KeyError: 'be338b34b87c7497d6fd82768b5f3e3d5f4442c3' E.g., import ray
ray.init()
@ray.remote
def f(x):
return 1
x = 1
for _ in range(10 ** 5):
x = f.remote(x) Then visualize the last 10000 tasks. Presumably this arises from omitting the initial scan of the entire task table. |
Alright that should be fixed. |
I don't see any new commits, did you push something? |
Pushed
…On Sun, Sep 10, 2017, 12:58 PM Robert Nishihara ***@***.***> wrote:
I don't see any new commits, did you push something?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#935 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAA6SmbjUvbYjjhP6Sf-lO472lNPY65Lks5shD9mgaJpZM4PNwN0>
.
|
Looks good to me, I'll merge it once the tests run. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
This PR ensures web ui responsiveness (#933) by adding a hard cap on the number of tasks returned (10000).
You can test with the following snippet which launches one million tasks locally. The UI should respond fluidly in all the visualizations except the timeline, which still should update within about 10 seconds.
cc @alanamarzoev @robertnishihara