Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Python's ObjectID to ObjectRef #9353

Merged
merged 32 commits into from
Jul 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
ef7f31d
ObjectID -> ObjectRef
raulchen Jul 8, 2020
5c92f9d
object_id -> object_ref
raulchen Jul 8, 2020
ad6d1c5
revert c_object_id and c_outer_object_id
raulchen Jul 8, 2020
a3a7d80
obj_id -> obj_ref
raulchen Jul 8, 2020
20bdc2c
objectid -> object_ref
raulchen Jul 8, 2020
7c6d0b0
object id
raulchen Jul 9, 2020
e4f5c7a
Move ObjectRef to object_ref.pxi and add alias
raulchen Jul 9, 2020
2ff9de8
add backward compatibility test
raulchen Jul 9, 2020
266e835
.rst: ObjectID -> ObjectRef
raulchen Jul 9, 2020
abb5eb1
.rst: object_id -> object_ref
raulchen Jul 9, 2020
13a68e8
.rst: obj_id -> obj_ref
raulchen Jul 9, 2020
cc5e6ff
.rst: object ID -> object ref
raulchen Jul 9, 2020
2b693eb
.rst: Object ID -> Object ref
raulchen Jul 9, 2020
6654310
Revert "object id"
raulchen Jul 9, 2020
461e4c6
object id -> object ref
raulchen Jul 9, 2020
b6bc9f7
lint
raulchen Jul 9, 2020
d0ba761
lint
raulchen Jul 9, 2020
972816a
Fix missing cases
raulchen Jul 9, 2020
b68d45d
Fix missing cases
raulchen Jul 9, 2020
9b6f2ab
revert some unneeded changes
raulchen Jul 9, 2020
a85c81b
lint
raulchen Jul 9, 2020
62f9bfe
Fix rst lint
raulchen Jul 10, 2020
7ccb4d4
Merge branch 'master' into object_ref
raulchen Jul 10, 2020
79cf8e7
small fixes
raulchen Jul 10, 2020
fdaaee0
oid -> obj_ref
raulchen Jul 10, 2020
999ffad
lint
raulchen Jul 10, 2020
5ebe88f
Fix test_memstat.py
raulchen Jul 10, 2020
c8ae208
lint
raulchen Jul 10, 2020
8f36d8c
Fix dashboard
raulchen Jul 10, 2020
97b8f8f
Fix CoreWorkerStats PB
raulchen Jul 10, 2020
219bb19
comment
raulchen Jul 10, 2020
3b41e68
fix CoreWorkerStats
raulchen Jul 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions ci/long_running_tests/workloads/many_tasks_serialized_ids.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# This workload stresses distributed reference counting by passing and
# returning serialized ObjectIDs.
# returning serialized ObjectRefs.
Comment on lines 1 to +2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to change all of the documentation (ray/doc/source/*.rst) too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will ping you again when I've changed everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All python/cython/rst files should have been fixed


import time
import json
Expand Down Expand Up @@ -50,8 +50,8 @@ def churn():

@ray.remote(max_retries=0)
def child(*xs):
oid = ray.put(np.zeros(1024 * 1024, dtype=np.uint8))
return oid
obj_ref = ray.put(np.zeros(1024 * 1024, dtype=np.uint8))
return obj_ref


@ray.remote(max_retries=0)
Expand Down
10 changes: 5 additions & 5 deletions ci/performance_tests/test_performance.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def create_array(size):
@ray.remote
def no_op(*values):
# The reason that this function takes *values is so that we can pass in
# an arbitrary number of object IDs to create task dependencies.
# an arbitrary number of object refs to create task dependencies.
return 1


Expand All @@ -134,16 +134,16 @@ def warm_up_cluster(num_nodes, object_store_memory):
size = object_store_memory * 2 // 5
num_objects = 2
while size > 0:
object_ids = []
object_refs = []
for i in range(num_nodes):
for _ in range(num_objects):
object_ids += [
object_refs += [
create_array._remote(args=[size], resources={str(i): 1})
]
size = size // 2
num_objects = min(num_objects * 2, 1000)
for object_id in object_ids:
ray.get(object_id)
for object_ref in object_refs:
ray.get(object_ref)
logger.warning("Finished warming up the object store.")

# Invoke all of the remote functions once so that the definitions are
Expand Down
4 changes: 2 additions & 2 deletions doc/examples/doc_code/tf_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ def set_weights(self, weights):
# yapf: disable
# __actor_start__
NetworkActor = Network.remote()
result_object_id = NetworkActor.train.remote()
ray.get(result_object_id)
result_object_ref = NetworkActor.train.remote()
ray.get(result_object_ref)
# __actor_end__
# yapf: enable

Expand Down
2 changes: 1 addition & 1 deletion doc/examples/lbfgs/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def full_grad(theta):
# Similarly, full_grad is a function which takes some parameters theta, and
# computes the gradient of the loss. Internally, these functions use Ray to
# distribute the computation of the loss and the gradient over the data
# that is represented by the remote object IDs x_batches and y_batches and
# that is represented by the remote object refs x_batches and y_batches and
# which is potentially distributed over a cluster. However, these details
# are hidden from scipy.optimize.fmin_l_bfgs_b, which simply uses it to run
# the L-BFGS algorithm.
Expand Down
8 changes: 4 additions & 4 deletions doc/examples/plot_hyperparameter.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,16 +169,16 @@ def evaluate_hyperparameters(config):
# Keep track of the best hyperparameters and the best accuracy.
best_hyperparameters = None
best_accuracy = 0
# A list holding the object IDs for all of the experiments that we have
# A list holding the object refs for all of the experiments that we have
# launched but have not yet been processed.
remaining_ids = []
# A dictionary mapping an experiment's object ID to its hyperparameters.
# A dictionary mapping an experiment's object ref to its hyperparameters.
# hyerparameters used for that experiment.
hyperparameters_mapping = {}

###########################################################################
# Launch asynchronous parallel tasks for evaluating different
# hyperparameters. ``accuracy_id`` is an ObjectID that acts as a handle to
# hyperparameters. ``accuracy_id`` is an ObjectRef that acts as a handle to
# the remote task. It is used later to fetch the result of the task
# when the task finishes.

Expand All @@ -195,7 +195,7 @@ def evaluate_hyperparameters(config):

# Fetch and print the results of the tasks in the order that they complete.
while remaining_ids:
# Use ray.wait to get the object ID of the first task that completes.
# Use ray.wait to get the object ref of the first task that completes.
done_ids, remaining_ids = ray.wait(remaining_ids)
# There is only one return result by default.
result_id = done_ids[0]
Expand Down
2 changes: 1 addition & 1 deletion doc/examples/plot_lbfgs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ instead of
then each task that got sent to the scheduler (one for every element of
``batch_ids``) would have had a copy of ``theta`` serialized inside of it. Since
``theta`` here consists of the parameters of a potentially large model, this is
inefficient. *Large objects should be passed by object ID to remote functions
inefficient. *Large objects should be passed by object ref to remote functions
and not by value*.

We use remote actors and remote objects internally in the implementation of
Expand Down
10 changes: 5 additions & 5 deletions doc/source/actors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ When the above actor is instantiated, the following events happen.
Actor Methods
-------------

Any method of the actor can return multiple object IDs with the ``ray.method`` decorator:
Any method of the actor can return multiple object refs with the ``ray.method`` decorator:

.. code-block:: python

Expand All @@ -60,9 +60,9 @@ Any method of the actor can return multiple object IDs with the ``ray.method`` d

f = Foo.remote()

obj_id1, obj_id2 = f.bar.remote()
assert ray.get(obj_id1) == 1
assert ray.get(obj_id2) == 2
obj_ref1, obj_ref2 = f.bar.remote()
assert ray.get(obj_ref1) == 1
assert ray.get(obj_ref2) == 2

.. _actor-resource-guide:

Expand Down Expand Up @@ -146,7 +146,7 @@ If necessary, you can manually terminate an actor by calling
``ray.actor.exit_actor()`` from within one of the actor methods. This will kill
the actor process and release resources associated/assigned to the actor. This
approach should generally not be necessary as actors are automatically garbage
collected. The ``ObjectID`` resulting from the task can be waited on to wait
collected. The ``ObjectRef`` resulting from the task can be waited on to wait
for the actor to exit (calling ``ray.get()`` on it will raise a ``RayActorError``).
Note that this method of termination will wait until any previously submitted
tasks finish executing. If you want to terminate an actor immediately, you can
Expand Down
10 changes: 5 additions & 5 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ For example, consider the following.

@ray.remote
def g():
# Call f 4 times and return the resulting object IDs.
# Call f 4 times and return the resulting object refs.
return [f.remote() for _ in range(4)]

@ray.remote
Expand All @@ -111,10 +111,10 @@ Then calling ``g`` and ``h`` produces the following behavior.
.. code:: python

>>> ray.get(g.remote())
[ObjectID(b1457ba0911ae84989aae86f89409e953dd9a80e),
ObjectID(7c14a1d13a56d8dc01e800761a66f09201104275),
ObjectID(99763728ffc1a2c0766a2000ebabded52514e9a6),
ObjectID(9c2f372e1933b04b2936bb6f58161285829b9914)]
[ObjectRef(b1457ba0911ae84989aae86f89409e953dd9a80e),
ObjectRef(7c14a1d13a56d8dc01e800761a66f09201104275),
ObjectRef(99763728ffc1a2c0766a2000ebabded52514e9a6),
ObjectRef(9c2f372e1933b04b2936bb6f58161285829b9914)]

>>> ray.get(h.remote())
[1, 1, 1, 1]
Expand Down
6 changes: 3 additions & 3 deletions doc/source/async_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ that supports top level ``await``:
await actor.run_concurrent.remote()


ObjectIDs as asyncio.Futures
----------------------------
ObjectIDs can be translated to asyncio.Future. This feature
ObjectRefs as asyncio.Futures
-----------------------------
ObjectRefs can be translated to asyncio.Future. This feature
make it possible to ``await`` on ray futures in existing concurrent
applications.

Expand Down
48 changes: 24 additions & 24 deletions doc/source/memory-management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,43 +3,43 @@ Memory Management

This page describes how memory management works in Ray and how you can set memory quotas to ensure memory-intensive applications run predictably and reliably.

ObjectID Reference Counting
---------------------------
ObjectRef Reference Counting
----------------------------

Ray implements distributed reference counting so that any ``ObjectID`` in scope in the cluster is pinned in the object store. This includes local python references, arguments to pending tasks, and IDs serialized inside of other objects.
Ray implements distributed reference counting so that any ``ObjectRef`` in scope in the cluster is pinned in the object store. This includes local python references, arguments to pending tasks, and IDs serialized inside of other objects.

Frequently Asked Questions (FAQ)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**My application failed with ObjectStoreFullError. What happened?**

Ensure that you're removing ``ObjectID`` references when they're no longer needed. See `Debugging using 'ray memory'`_ for information on how to identify what objects are in scope in your application.
Ensure that you're removing ``ObjectRef`` references when they're no longer needed. See `Debugging using 'ray memory'`_ for information on how to identify what objects are in scope in your application.

This exception is raised when the object store on a node was full of pinned objects when the application tried to create a new object (either by calling ``ray.put()`` or returning an object from a task). If you're sure that the configured object store size was large enough for your application to run, ensure that you're removing ``ObjectID`` references when they're no longer in use so their objects can be evicted from the object store.
This exception is raised when the object store on a node was full of pinned objects when the application tried to create a new object (either by calling ``ray.put()`` or returning an object from a task). If you're sure that the configured object store size was large enough for your application to run, ensure that you're removing ``ObjectRef`` references when they're no longer in use so their objects can be evicted from the object store.

**I'm running Ray inside IPython or a Jupyter Notebook and there are ObjectID references causing problems even though I'm not storing them anywhere.**
**I'm running Ray inside IPython or a Jupyter Notebook and there are ObjectRef references causing problems even though I'm not storing them anywhere.**

Try `Enabling LRU Fallback`_, which will cause unused objects referenced by IPython to be LRU evicted when the object store is full instead of erroring.

IPython stores the output of every cell in a local Python variable indefinitely. This causes Ray to pin the objects even though your application may not actually be using them.
IPython stores the output of every cell in a local Python variable indefinitely. This causes Ray to pin the objects even though your application may not actually be using them.

**My application used to run on previous versions of Ray but now I'm getting ObjectStoreFullError.**

Either modify your application to remove ``ObjectID`` references when they're no longer needed or try `Enabling LRU Fallback`_ to revert to the old behavior.
Either modify your application to remove ``ObjectRef`` references when they're no longer needed or try `Enabling LRU Fallback`_ to revert to the old behavior.

In previous versions of Ray, there was no reference counting and instead objects in the object store were LRU evicted once the object store ran out of space. Some applications (e.g., applications that keep references to all objects ever created) may have worked with LRU eviction but do not with reference counting.
In previous versions of Ray, there was no reference counting and instead objects in the object store were LRU evicted once the object store ran out of space. Some applications (e.g., applications that keep references to all objects ever created) may have worked with LRU eviction but do not with reference counting.

Debugging using 'ray memory'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``ray memory`` command can be used to help track down what ``ObjectID`` references are in scope and may be causing an ``ObjectStoreFullError``.
The ``ray memory`` command can be used to help track down what ``ObjectRef`` references are in scope and may be causing an ``ObjectStoreFullError``.

Running ``ray memory`` from the command line while a Ray application is running will give you a dump of all of the ``ObjectID`` references that are currently held by the driver, actors, and tasks in the cluster.
Running ``ray memory`` from the command line while a Ray application is running will give you a dump of all of the ``ObjectRef`` references that are currently held by the driver, actors, and tasks in the cluster.

.. code-block::

-----------------------------------------------------------------------------------------------------
Object ID Reference Type Object Size Reference Creation Site
Object Ref Reference Type Object Size Reference Creation Site
=====================================================================================================
; worker pid=18301
45b95b1c8bd3a9c4ffffffff010000c801000000 LOCAL_REFERENCE ? (deserialize task arg) __main__..f
Expand All @@ -50,11 +50,11 @@ Running ``ray memory`` from the command line while a Ray application is running
ffffffffffffffffffffffff0100008801000000 LOCAL_REFERENCE 77 (put object) test.py:<module>:9
-----------------------------------------------------------------------------------------------------

Each entry in this output corresponds to an ``ObjectID`` that's currently pinning an object in the object store along with where the reference is (in the driver, in a worker, etc.), what type of reference it is (see below for details on the types of references), the size of the object in bytes, and where in the application the reference was created.
Each entry in this output corresponds to an ``ObjectRef`` that's currently pinning an object in the object store along with where the reference is (in the driver, in a worker, etc.), what type of reference it is (see below for details on the types of references), the size of the object in bytes, and where in the application the reference was created.

There are five types of references that can keep an object pinned:

**1. Local ObjectID references**
**1. Local ObjectRef references**

.. code-block:: python

Expand All @@ -70,7 +70,7 @@ In this example, we create references to two objects: one that is ``ray.put()``
.. code-block::

-----------------------------------------------------------------------------------------------------
Object ID Reference Type Object Size Reference Creation Site
Object Ref Reference Type Object Size Reference Creation Site
=====================================================================================================
; driver pid=18867
ffffffffffffffffffffffff0100008801000000 LOCAL_REFERENCE 77 (put object) ../test.py:<module>:9
Expand All @@ -89,12 +89,12 @@ In the output from ``ray memory``, we can see that each of these is marked as a
b = ray.get(a)
del a

In this example, we create a ``numpy`` array and then store it in the object store. Then, we fetch the same numpy array from the object store and delete its ``ObjectID``. In this case, the object is still pinned in the object store because the deserialized copy (stored in ``b``) points directly to the memory in the object store.
In this example, we create a ``numpy`` array and then store it in the object store. Then, we fetch the same numpy array from the object store and delete its ``ObjectRef``. In this case, the object is still pinned in the object store because the deserialized copy (stored in ``b``) points directly to the memory in the object store.

.. code-block::

-----------------------------------------------------------------------------------------------------
Object ID Reference Type Object Size Reference Creation Site
Object Ref Reference Type Object Size Reference Creation Site
=====================================================================================================
; driver pid=25090
ffffffffffffffffffffffff0100008801000000 PINNED_IN_MEMORY 229 test.py:<module>:7
Expand All @@ -119,7 +119,7 @@ In this example, we first create an object via ``ray.put()`` and then submit a t
.. code-block::

-----------------------------------------------------------------------------------------------------
Object ID Reference Type Object Size Reference Creation Site
Object Ref Reference Type Object Size Reference Creation Site
=====================================================================================================
; worker pid=18971
ffffffffffffffffffffffff0100008801000000 PINNED_IN_MEMORY 77 (deserialize task arg) __main__..f
Expand All @@ -130,7 +130,7 @@ In this example, we first create an object via ``ray.put()`` and then submit a t

While the task is running, we see that ``ray memory`` shows both a ``LOCAL_REFERENCE`` and a ``USED_BY_PENDING_TASK`` reference for the object in the driver process. The worker process also holds a reference to the object because it is ``PINNED_IN_MEMORY``, because the Python ``arg`` is directly referencing the memory in the plasma, so it can't be evicted.

**4. Serialized ObjectID references**
**4. Serialized ObjectRef references**

.. code-block:: python

Expand All @@ -147,7 +147,7 @@ In this example, we again create an object via ``ray.put()``, but then pass it t
.. code-block::

-----------------------------------------------------------------------------------------------------
Object ID Reference Type Object Size Reference Creation Site
Object Ref Reference Type Object Size Reference Creation Site
=====================================================================================================
; worker pid=19002
ffffffffffffffffffffffff0100008801000000 LOCAL_REFERENCE 77 (deserialize task arg) __main__..f
Expand All @@ -156,22 +156,22 @@ In this example, we again create an object via ``ray.put()``, but then pass it t
45b95b1c8bd3a9c4ffffffff010000c801000000 LOCAL_REFERENCE ? (task call) ../test.py:<module>:10
-----------------------------------------------------------------------------------------------------

Now, both the driver and the worker process running the task hold a ``LOCAL_REFERENCE`` to the object in addition to it being ``USED_BY_PENDING_TASK`` on the driver. If this was an actor task, the actor could even hold a ``LOCAL_REFERENCE`` after the task completes by storing the ``ObjectID`` in a member variable.
Now, both the driver and the worker process running the task hold a ``LOCAL_REFERENCE`` to the object in addition to it being ``USED_BY_PENDING_TASK`` on the driver. If this was an actor task, the actor could even hold a ``LOCAL_REFERENCE`` after the task completes by storing the ``ObjectRef`` in a member variable.

**5. Captured ObjectID references**
**5. Captured ObjectRef references**

.. code-block:: python

a = ray.put(None)
b = ray.put([a])
del a

In this example, we first create an object via ``ray.put()``, then capture its ``ObjectID`` inside of another ``ray.put()`` object, and delete the first ``ObjectID``. In this case, both objects are still pinned.
In this example, we first create an object via ``ray.put()``, then capture its ``ObjectRef`` inside of another ``ray.put()`` object, and delete the first ``ObjectRef``. In this case, both objects are still pinned.

.. code-block::

-----------------------------------------------------------------------------------------------------
Object ID Reference Type Object Size Reference Creation Site
Object Ref Reference Type Object Size Reference Creation Site
=====================================================================================================
; driver pid=19047
ffffffffffffffffffffffff0100008802000000 LOCAL_REFERENCE 1551 (put object) ../test.py:<module>:10
Expand Down
Loading