Skip to content

Commit

Permalink
Remove register_class from API. (ray-project#550)
Browse files Browse the repository at this point in the history
* Perform ray.register_class under the hood.

* Fix bug.

* Release worker lock when waiting for imports to arrive in get.

* Remove calls to register_class from examples and tests.

* Clear serialization state between tests.

* Fix bug and add test for multiple custom classes with same name.

* Fix failure test.

* Fix linting and cleanups to python code.

* Fixes to documentation.

* Implement recursion depth for recursively registering classes.

* Fix linting.

* Push warning to user if waiting for class for too long.

* Fix typos.

* Don't export FunctionToRun if pickling the function fails.

* Don't broadcast class definition when pickling class.
  • Loading branch information
robertnishihara authored and pcmoritz committed May 17, 2017
1 parent 3ebfd85 commit ec25344
Show file tree
Hide file tree
Showing 11 changed files with 377 additions and 303 deletions.
2 changes: 0 additions & 2 deletions doc/source/actors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,8 +220,6 @@ We can put this all together as follows.
# Load the MNIST dataset and tell Ray how to serialize the custom classes.
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
ray.register_class(type(mnist))
ray.register_class(type(mnist.train))
# Create the actor.
nn = NeuralNetOnGPU.remote(mnist)
Expand Down
94 changes: 11 additions & 83 deletions doc/source/serialization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,15 @@ store.

1. The return values of a remote function.
2. The value ``x`` in a call to ``ray.put(x)``.
3. Large objects or objects other than simple primitive types that are passed
as arguments into remote functions.
3. Arguments to remote functions (except for simple arguments like ints or
floats).

A Python object may have an arbitrary number of pointers with arbitrarily deep
nesting. To place an object in the object store or send it between processes,
it must first be converted to a contiguous string of bytes. This process is
known as serialization. The process of converting the string of bytes back into a
Python object is known as deserialization. Serialization and deserialization
are often bottlenecks in distributed computing if the time needed to compute
on the data is relatively low.
are often bottlenecks in distributed computing.

Pickle is one example of a library for serialization and deserialization in
Python.
Expand All @@ -40,9 +39,8 @@ overheads, even when all processes are read-only and could easily share memory.
In Ray, we optimize for numpy arrays by using the `Apache Arrow`_ data format.
When we deserialize a list of numpy arrays from the object store, we still
create a Python list of numpy array objects. However, rather than copy each
numpy array over again, each numpy array object holds a pointer to the relevant
array held in shared memory. There are some advantages to this form of
serialization.
numpy array, each numpy array object holds a pointer to the relevant array held
in shared memory. There are some advantages to this form of serialization.

- Deserialization can be very fast.
- Memory is shared between processes so worker processes can all read the same
Expand All @@ -54,84 +52,17 @@ What Objects Does Ray Handle
----------------------------

Ray does not currently support serialization of arbitrary Python objects. The
set of Python objects that Ray can serialize includes the following.
set of Python objects that Ray can serialize using Arrow includes the following.

1. Primitive types: ints, floats, longs, bools, strings, unicode, and numpy
arrays.
2. Any list, dictionary, or tuple whose elements can be serialized by Ray.
3. Objects whose classes can be registered with ``ray.register_class``. This
point is described below.

Registering Custom Classes
--------------------------

We currently support serializing a limited subset of custom classes. For
example, suppose you define a new class ``Foo`` as follows.

.. code-block:: python
class Foo(object):
def __init__(self, a, b):
self.a = a
self.b = b
Simply calling ``ray.put(Foo(1, 2))`` will fail with a message like

.. code-block:: python
Ray does not know how to serialize the object <__main__.Foo object at 0x1077d7c50>.
This can be addressed by calling ``ray.register_class(Foo)``.

.. code-block:: python
import ray
ray.init()
# Define a custom class.
class Foo(object):
def __init__(self, a, b):
self.a = a
self.b = b
# Calling ray.register_class(Foo) ships the class definition to all of the
# workers so that workers know how to construct new Foo objects.
ray.register_class(Foo)
# Create a Foo object, place it in the object store, and retrieve it.
f = Foo(1, 2)
f_id = ray.put(f)
ray.get(f_id) # prints <__main__.Foo at 0x1078128d0>
Under the hood, ``ray.put`` places ``f.__dict__``, the dictionary of attributes
of ``f``, into the object store instead of ``f`` itself. In this case, this is
the dictionary, ``{"a": 1, "b": 2}``. Then during deserialization, ``ray.get``
constructs a new ``Foo`` object from the dictionary of fields.

This naive substitution won't work in all cases. For example, this scheme does
not support Python objects of type ``function`` (e.g., ``f = lambda x: x +
1``). In these cases, the call to ``ray.register_class`` will give an error
message, and you should fall back to pickle.

.. code-block:: python
# This call tells Ray to fall back to using pickle when it encounters objects
# of type function (we actually already do this under the hood).
f = lambda x: x + 1
ray.register_class(type(f), pickle=True)
f_new = ray.get(ray.put(f))
f_new(0) # prints 1
However, it's best to avoid using pickle for the efficiency reasons described
above. If you find yourself needing to pickle certain objects, consider trying
to use more efficient data structures like arrays.

**Note:** Another setting where the naive replacement of an object with its
``__dict__`` attribute fails is recursion, e.g., an object contains itself or
multiple objects contain each other. To see more examples of this, see the
section `Notes and Limitations`_.
For a more general object, Ray will first attempt to serialize the object by
unpacking the object as a dictionary of its fields. This behavior is not
correct in all cases. If Ray cannot serialize the object as a dictionary of its
fields, Ray will fall back to using pickle. However, using pickle will likely
be inefficient.

Notes and limitations
---------------------
Expand Down Expand Up @@ -167,9 +98,6 @@ Notes and limitations
This object exceeds the maximum recursion depth. It may contain itself recursively.
- If you need to pass a custom class into a remote function, you should call
``ray.register_class`` on the class **before defining the remote function**.

- Whenever possible, use numpy arrays for maximum performance.

Last Resort Workaround
Expand Down
4 changes: 0 additions & 4 deletions examples/evolution_strategies/evolution_strategies.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,10 +154,6 @@ def do_rollouts(self, params, ob_mean, ob_std, timestep_limit=None):
ray.init(redis_address=args.redis_address,
num_workers=(0 if args.redis_address is None else None))

# Tell Ray to serialize Config and Result objects.
ray.register_class(Config)
ray.register_class(Result)

config = Config(l2coeff=0.005,
noise_stdev=0.02,
episodes_per_batch=10000,
Expand Down
4 changes: 0 additions & 4 deletions examples/policy_gradient/examples/example.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,6 @@

ray.init(redis_address=args.redis_address)

ray.register_class(AtariRamPreprocessor)
ray.register_class(AtariPixelPreprocessor)
ray.register_class(NoPreprocessor)

mdp_name = args.environment
if args.environment == "Pong-v0":
preprocessor = AtariPixelPreprocessor()
Expand Down
4 changes: 0 additions & 4 deletions python/ray/experimental/array/distributed/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,6 @@ def __getitem__(self, sliced):
return a[sliced]


# Register the DistArray class with Ray so that it knows how to serialize it.
ray.register_class(DistArray)


@ray.remote
def assemble(a):
return a.assemble()
Expand Down
Loading

0 comments on commit ec25344

Please sign in to comment.