Skip to content

Commit

Permalink
[DataFrame] readthedocs page for Pandas on Ray (ray-project#1714)
Browse files Browse the repository at this point in the history
  • Loading branch information
devin-petersohn authored and robertnishihara committed Mar 14, 2018
1 parent adffc7b commit c19c2a4
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 0 deletions.
6 changes: 6 additions & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ Ray comes with libraries that accelerate deep learning and reinforcement learnin
rllib.rst
rllib-dev.rst

.. toctree::
:maxdepth: 1
:caption: Pandas on Ray

pandas_on_ray.rst

.. toctree::
:maxdepth: 1
:caption: Examples
Expand Down
71 changes: 71 additions & 0 deletions doc/source/pandas_on_ray.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
Pandas on Ray
=============

Pandas on Ray is an early stage DataFrame library that wraps Pandas and
transparently distributes the data and computation. The user does not need to
know how many cores their system has, nor do they need to specify how to
distribute the data. In fact, users can continue using their previous Pandas
notebooks while experiencing a considerable speedup from Pandas on Ray, even
on a single machine. Only a modification of the import statement is needed, as
we demonstrate below. Once you’ve changed your import statement, you’re ready
to use Pandas on Ray just like you would Pandas.

.. code-block:: python
# import pandas as pd
import ray.dataframe as pd
Currently, we have part of the Pandas API implemented and are working toward
full functional parity with Pandas.

Using Pandas on Ray on a Single Node
------------------------------------

In order to use the most up-to-date version of Pandas on Ray, please follow
the instructions on the `installation page`_

Once you import the library, you should see something similar to the following
output:

.. code-block:: text
>>> import ray.dataframe as pd
Waiting for redis server at 127.0.0.1:14618 to respond...
Waiting for redis server at 127.0.0.1:31410 to respond...
Starting local scheduler with the following resources: {'CPU': 4, 'GPU': 0}.
======================================================================
View the web UI at http://localhost:8889/notebooks/ray_ui36796.ipynb?token=ac25867d62c4ae87941bc5a0ecd5f517dbf80bd8e9b04218
======================================================================
If you do not see output similar to the above, please make sure that you have
built Ray using the instructions on the `installation page`_

One you have executed ``import ray.dataframe as pd``, you're ready to begin
running your Pandas pipeline as you were before. Please note, the API is not
yet complete. For some methods, you may see the following:

.. code-block:: text
NotImplementedError: To contribute to Pandas on Ray, please visit github.com/ray-project/ray.
If you would like to request a particular method be implemented, feel free to
`open an issue`_. Before you open an issue please make sure that someone else
has not already requested that functionality.

Using Pandas on Ray on a Cluster
--------------------------------

Currently, we do not yet support running Pandas on Ray on a cluster. Coming
Soon!

Examples
--------
You can find an example on our recent `blog post`_ or on the
`Jupyter Notebook`_ that we used to create the blog post.

.. _`installation page`: http://ray.readthedocs.io/en/latest/installation.html
.. _`open an issue`: http://github.com/ray-project/ray/issues
.. _`blog post`: http://rise.cs.berkeley.edu/blog/pandas-on-ray
.. _`Jupyter Notebook`: http://gist.github.com/devin-petersohn/f424d9fb5579a96507c709a36d487f24#file-pandas_on_ray_blog_post_0-ipynb

0 comments on commit c19c2a4

Please sign in to comment.