Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation- Basic Profiling for Ray Users #2326

Merged
merged 18 commits into from
Jul 12, 2018

Conversation

crystalzyan
Copy link
Contributor

What do these changes do?

Added a documentation tutorial (with graphics) on profiling the performance of a basic Ray example using Python time, Python cProfile, line_profiler (a third-party profiler), and the Ray timeline web UI. There is an existing documentation page profiling.rst for Ray developers, so I put mine's as a separate page user-profiling.rst under the same Help section.

Related issue number

No issue number. Requested by @pcmoritz

…posed to current Profiling section for Ray developers. Completed three sections 'A Basic Profiling Example', 'Timing Performance Using Python's Timestamps', and 'Profiling Using An External Profiler (Line_Profiler).' Left to-do two sections on CProfile and Ray Timeline Visualization.'
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6395/
Test PASSed.

This document is intended for users of Ray who want to know how to evaluate
the performance of their code while running on Ray. Profiling the
performance of your code can be very helpful to determine performance
bottlenecks or where your code may not be parallelizing properly. If you
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"or to find out where your code may not be parallelized properly"

performance of your code can be very helpful to determine performance
bottlenecks or where your code may not be parallelizing properly. If you
are interested in pinpointing why your Ray application may not be
achieving speedups as expected, then do read on!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the expected speedup, read on!"

Copy link
Collaborator

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments:

  • Let's remove the images. They're really nice, but I'm a bit concerned about their size. As the codebase gets larger, it becomes more difficult to download and work with.
  • Maybe show people how to use cProfile inside of an actor.

A Basic Example to Profile
--------------------------

Let's try to profile a simple example, and compare how different looping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and compare how different ways to write a simple loop can affect performance"

Let's try to profile a simple example, and compare how different looping
call structures of the same remote function affects performance.

As a stand-in for a computationally intensive and possibly slower function,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"As a proxy"? stand-in sounds a little strange to me :)

list2.append(func.remote())
ray.get(list2)

Finally, as a demonstration of Ray's parallelism abilities, let's create a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this example should be a bit different. The idea is to call another function (which is not remote) as other_func; this is going to slow down the loop since that part is not parallelized (so just remove the @ray.remote from other_func and also the .remote). This shows that if there is still a large serial component to the loop, parallelism speedup might not be as large as expected (this is called https://en.wikipedia.org/wiki/Amdahl%27s_law). Does it make sense? Please modify the description accordingly.

time how long it takes to complete each loop version. We can do this using
python's built-in ``time`` `module`_.

.. _`module`: https://docs.python.org/2/library/time.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's link to the Python 3 version here.

if __name__ == "__main__":
main()

Alternatively, to print out the timer on **selective** calls to ``ex1()``,
Copy link
Contributor

@pcmoritz pcmoritz Jul 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we get rid of this paragraph and the code after? It doesn't really add much I think and is not needed for profiling.


Let's interpret these results.

Most pertinently, ``ex1()`` took substantially more time than ``ex2()``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, ex1() took

Let's interpret these results.

Most pertinently, ``ex1()`` took substantially more time than ``ex2()``,
despite their only difference being that ``ex1()`` calls ``ray.get`` on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid of "despite" and just way what the difference is.


By calling ``ray.get`` after each call to the remote function, ``ex1()``
removes all ability to parallelize work, by forcing the driver to wait for
each ``func()``'s result in succession. We are completely sabotaging any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of sabotaging, let's say "we are not taking advantage of Ray parallelization here" :)

Ray and multiple CPUs, this loop would take at least 3.5 seconds to finish.


Profiling Using An External Profiler (Line_Profiler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this caption, let's remove the _ from Line_Profiler

@pcmoritz
Copy link
Contributor

pcmoritz commented Jul 8, 2018

Let's try to keep the images first and compress them with JPEG. If we can get them under 50KB, let's keep them!

Profiling the performance of your Ray application doesn't need to be
an eye-straining endeavor of interpreting numbers among hundreds of
lines of text. Ray comes with its own visual web UI to visualize the
parallelization (or lack thereof) of user tasks submitted to Ray!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point out the limitation somewhere, i.e. this only shows timing info about ray tasks, but doesn't break timing down to normal python functions. This can be a problem especially on a driver, that's why we explain the other profiling techniques above.

parallelization (or lack thereof) of user tasks submitted to Ray!

Currently, whenever initializing Ray, a URL is automatically generated and
printed to terminal on where to view Ray's web UI as a Jupyter notebook:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated and printed in the terminal. This URL can be used to view...

example, it has opened on port 8897.

Because this web UI is only available as long as your Ray application
is currently running, you may need to add a user prompt to stall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to prevent your Ray application

ex3()

# Require user input confirmation before exiting
hang = int(input('Examples finished executing. Enter any integer to exit:'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hang = input('Examples finished executing. Press enter to exit:')

the first batch of ``func()`` calls is finished.

**For more on Ray's Web UI,** such as how to access the UI on a remote
node over ssh, or for troubleshooting installation, **please see our**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove bolding from "please see our"

Copy link
Contributor

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool! There is a little bit of reworking needed before we can merge it, please see the comments.

…ed to address (1) compressing the image files, (2) correcting ex 3 to not be remote, and (3) using cProfile on an actor
… a semi-parallelized example. Compressed timeline example image to be under 50 KB, removed view timeline GUI image. Updated timeline example image to reflect revised example 3. cProfile actor example left
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6614/
Test PASSed.

@crystalzyan
Copy link
Contributor Author

I updated the following:

  1. Created new example ex4 of profiling Ray actors under the cProfile section. I decided to create an example of another possible user mistake, where they use one actor instead of several to parallelize code. Code will not speed up even if ray.wait() is called at the very end (the fix that would work on ex1).
  2. Corrected ex3 to alternate to a local function instead of to a remote function. Consequently updated profiling interpretation/explanation of ex3, ex3 profiling output results for all profiling technique sections, and the Ray timeline image.
  3. Removed one of the images and compressed the other image to be <50 KB. .GIF is actually a better compression method than .JPG in this case, because .JPG compression is optimal for low-contrast images, not sharp-contrast typeface and GUIs. Additionally, further compressed the image by lowering the dimensions/resolution and reducing the color palette to 128 colors.
    I'll point out though that there are multiple other existing images in the documentation folder as well, some >500 KB in size (such as ray-tune-parcoords.png), which is why I did not realize that image files might be considered too big to keep in the codebase. Additionally, the current documentation folder is a little messy as the image files are all mixed in with the documentation .rst files, and there is actually also an "images" directory that's not being completely used within the documentation folder. This may need to be cleaned up?
  4. Minor wording changes

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6615/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6620/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6619/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6622/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6625/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6624/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6626/
Test FAILed.

@pcmoritz pcmoritz dismissed robertnishihara’s stale review July 12, 2018 23:57

changes have been incorporated

@pcmoritz pcmoritz merged commit ebf4070 into ray-project:master Jul 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants