Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray debugger stepping between tasks #12075

Merged
merged 43 commits into from
Dec 7, 2020

Conversation

pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented Nov 17, 2020

Why are these changes needed?

This PR adds the capability to step between Ray tasks in the debugger. It adds two new commands to the debugger,
"remote", which will fast-forward to the next .remote call and then go inside it and attach the debugger there so users can inspect the status/stack/etc. It also adds a new command "get", which will finish the current task and enter the debugger when ray.get is called on the result of the task.

I will add an example and more documentation in a separate PR.

This functionality is implemented by tracing through the .remote codepath and the object return codepath. This allows us to provide the functionality without an extra flag and no additional costs during normal runtime. The metadata for objects had to be slightly altered to make it possible to store the additional information required to implement this.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@pcmoritz pcmoritz changed the title [WIP] Ray debugger stepping between tasks Ray debugger stepping between tasks Nov 24, 2020
Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  • This implementation seems to heavily depend on piggybacking on object metadata. Is this necessary or are there alternative designs that don't require this kind of deep integration?
  • If we change the object metadata format, we should do that in another PR first before this one

python/ray/ray_constants.py Outdated Show resolved Hide resolved
OBJECT_METADATA_TYPE_ACTOR_HANDLE = b"A"

# A constant indicating the debugging part of the metadata (see above).
OBJECT_METADATA_DEBUG_PREFIX = b"D"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OBJECT_METADATA_DEBUG_PREFIX = b"D"
OBJECT_METADATA_DEBUG_PREFIX = b"DEBUG:"

python/ray/ray_constants.py Outdated Show resolved Hide resolved
python/ray/serialization.py Outdated Show resolved Hide resolved
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 25, 2020
@pcmoritz
Copy link
Contributor Author

@ericl: I first tried to do it only by passing things around through the internal_kv store, but that approach is unfortunately not workable since it would always add a redis lookup into the critical path of task execution (to check if we need to drop into the debugger).

I will make another PR to refactor the metadata handling first and do it using a comma separated list!

@pcmoritz pcmoritz removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 30, 2020
@pcmoritz pcmoritz requested a review from ericl November 30, 2020 05:55
@ericl
Copy link
Contributor

ericl commented Nov 30, 2020

@pcmoritz I'm trying this out with

import time
import ray

ray.init()

@ray.remote
def f(x):
    print("Fact", x)
    if x < 1:
        return 1
    ray.util.pdb.set_trace()
    x_id = f.remote(x - 1)
    return x * ray.get(x_id)


print(ray.get(f.remote(10)))

And it seems like after I call remote twice, it hangs, any ideas?

2020-11-30 12:43:48,108	INFO scripts.py:164 -- Connecting to Ray instance at 192.168.5.121:6379.
2020-11-30 12:43:48,109	INFO worker.py:675 -- Connecting to existing Ray cluster at address: 192.168.5.121:6379
Active breakpoints:
0: ray::f() | test.py:11
NoneType: None

Enter breakpoint index or press enter to refresh: 0
> /home/eric/Desktop/test.py(12)f()
-> x_id = f.remote(x - 1)
(Pdb) list
  7  	def f(x):
  8  	    print("Fact", x)
  9  	    if x < 1:
 10  	        return 1
 11  	    ray.util.pdb.set_trace()
 12  ->	    x_id = f.remote(x - 1)
 13  	    return x * ray.get(x_id)
 14  	
 15  	
 16  	print(ray.get(f.remote(10)))
[EOF]
(Pdb) remote
*** Connection closed by remote host ***
Continuing pdb session in different process...
--Call--
> /home/eric/Desktop/test.py(6)f()
-> @ray.remote
(Pdb) list
  1  	import time
  2  	import ray
  3  	
  4  	ray.init()
  5  	
  6  ->	@ray.remote
  7  	def f(x):
  8  	    print("Fact", x)
  9  	    if x < 1:
 10  	        return 1
 11  	    ray.util.pdb.set_trace()
(Pdb) remote
*** Connection closed by remote host ***



@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 30, 2020
@pcmoritz
Copy link
Contributor Author

pcmoritz commented Dec 3, 2020

Thanks Eric, I fixed this now and added it as a test case!

@pcmoritz pcmoritz removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 3, 2020
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 6, 2020
@pcmoritz pcmoritz merged commit 73a1a23 into ray-project:master Dec 7, 2020
@pcmoritz pcmoritz deleted the rpdb-task-stepping branch December 7, 2020 05:50
@pcmoritz pcmoritz removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants