forked from ray-project/ray
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add 0.4 release blog post. (ray-project#1794)
- Loading branch information
1 parent
71829a2
commit 5c86f34
Showing
1 changed file
with
87 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
--- | ||
layout: post | ||
title: "Ray: 0.4 Release" | ||
excerpt: "This post announces the release of Ray 0.4." | ||
date: 2018-03-27 14:00:00 | ||
--- | ||
|
||
We are pleased to announce the 0.4 release of [Ray][1]. This release introduces | ||
improvements to Ray's scheduling, substantial backend improvements, and the | ||
start of [Pandas on Ray][2], as well as many improvements to [RLlib][3] and | ||
[Tune][4] (you can read more about the improvements in RLlib in [this blog | ||
post][5]). | ||
|
||
To upgrade to the latest version, run | ||
|
||
``` | ||
pip install -U ray | ||
``` | ||
|
||
## Scheduling | ||
|
||
This release includes two major changes to the scheduling behavior: spillback | ||
scheduling and support for custom resources. | ||
|
||
### Spillback Scheduling | ||
|
||
Because Ray takes a bottom-up hierarchical approach to scheduling in which | ||
scheduling decisions are made by local schedulers on each machine (in order to | ||
avoid a centralized scheduling bottleneck), scheduling decisions are often made | ||
with a slightly stale view of the system state. As a consequence, it is possible | ||
for race conditions to occur and for too many tasks to be assigned to a single | ||
machine. For example, consider the case in which there are 100 GPUs scattered | ||
around the cluster and workers on the various machines submit a total of exactly | ||
100 GPU tasks. If these tasks take a while to execute, then the desired behavior | ||
is for one task to be assigned to each GPU. However, without a single scheduling | ||
bottleneck like a centralized scheduler, too many tasks may be assigned to a | ||
single machine resulting in delays. | ||
|
||
Spillback scheduling provides a mechanism for correcting for bad scheduling | ||
decisions. At a high-level, if a local scheduler decides that it does not want | ||
to execute a task, it can spill the task back to the global scheduler (or in | ||
principle to another local scheduler). This mechanism allows us to achieve | ||
perfect load balancing along with high task throughput. | ||
|
||
### Custom Resources | ||
|
||
Remote functions and actors now support scheduling with [arbitrary custom | ||
resource requirements][6]. We can specify that a remote function requires 1 CPU, 2 | ||
GPUs, and 3 of some custom resource with syntax like the following. | ||
|
||
```python | ||
@ray.remote(num_cpus=1, num_gpus=2, resources={'Custom': 3}) | ||
def f(): | ||
pass | ||
``` | ||
|
||
To tell Ray that a node has 6 of the custom resource, Ray should be started on | ||
that machine with a command like the following. | ||
|
||
``` | ||
ray start ... --resources='{"Custom": 6}' | ||
``` | ||
|
||
Custom resources can be used for a variety of different purposes. For example, | ||
the can be used to do bookkeeping of a concrete resource like memory, or to | ||
indicate that a particular dataset lives on a particular machine, or to give | ||
machines certain roles (such as a "parameter server" machine or a "worker" | ||
machine). | ||
|
||
## Libraries | ||
|
||
This release also includes the start of [Pandas on Ray][2], which is a project | ||
aimed at speeding up [Pandas][7] DataFrames. It also includes substantial | ||
improvements to [RLlib][3], such as high-quality implementations of algorithms | ||
like [Ape-X][8], and substantial improvements to [Tune][4], such as | ||
implementations of state of the art algorithms like [Population Based | ||
Training][9]. | ||
|
||
[1]: https://github.com/ray-project/ray | ||
[2]: https://rise.cs.berkeley.edu/blog/pandas-on-ray/ | ||
[3]: http://ray.readthedocs.io/en/latest/rllib.html | ||
[4]: http://ray.readthedocs.io/en/latest/tune.html | ||
[5]: https://rise.cs.berkeley.edu/blog/distributed-policy-optimizers-for-scalable-and-reproducible-deep-rl/ | ||
[6]: http://ray.readthedocs.io/en/latest/resources.html | ||
[7]: https://pandas.pydata.org/ | ||
[8]: https://arxiv.org/abs/1803.00933 | ||
[9]: http://ray.readthedocs.io/en/latest/pbt.html |