Add 0.4 release blog post. (ray-project#1794)

LoganDing · Apr 2, 2018 · 5c86f34 · 5c86f34
1 parent 71829a2
commit 5c86f34
Showing 1 changed file with 87 additions and 0 deletions.
diff --git a/site/_posts/2018-03-27-ray-0.4-release.markdown b/site/_posts/2018-03-27-ray-0.4-release.markdown
@@ -0,0 +1,87 @@
+---
+layout: post
+title: "Ray: 0.4 Release"
+excerpt: "This post announces the release of Ray 0.4."
+date: 2018-03-27 14:00:00
+---
+
+We are pleased to announce the 0.4 release of [Ray][1]. This release introduces
+improvements to Ray's scheduling, substantial backend improvements, and the
+start of [Pandas on Ray][2], as well as many improvements to [RLlib][3] and
+[Tune][4] (you can read more about the improvements in RLlib in [this blog
+post][5]).
+
+To upgrade to the latest version, run
+
+```
+pip install -U ray
+```
+
+## Scheduling
+
+This release includes two major changes to the scheduling behavior: spillback
+scheduling and support for custom resources.
+
+### Spillback Scheduling
+
+Because Ray takes a bottom-up hierarchical approach to scheduling in which
+scheduling decisions are made by local schedulers on each machine (in order to
+avoid a centralized scheduling bottleneck), scheduling decisions are often made
+with a slightly stale view of the system state. As a consequence, it is possible
+for race conditions to occur and for too many tasks to be assigned to a single
+machine. For example, consider the case in which there are 100 GPUs scattered
+around the cluster and workers on the various machines submit a total of exactly
+100 GPU tasks. If these tasks take a while to execute, then the desired behavior
+is for one task to be assigned to each GPU. However, without a single scheduling
+bottleneck like a centralized scheduler, too many tasks may be assigned to a
+single machine resulting in delays.
+
+Spillback scheduling provides a mechanism for correcting for bad scheduling
+decisions. At a high-level, if a local scheduler decides that it does not want
+to execute a task, it can spill the task back to the global scheduler (or in
+principle to another local scheduler). This mechanism allows us to achieve
+perfect load balancing along with high task throughput.
+
+### Custom Resources
+
+Remote functions and actors now support scheduling with [arbitrary custom
+resource requirements][6]. We can specify that a remote function requires 1 CPU, 2
+GPUs, and 3 of some custom resource with syntax like the following.
+
+```python
+@ray.remote(num_cpus=1, num_gpus=2, resources={'Custom': 3})
+def f():
+    pass
+```
+
+To tell Ray that a node has 6 of the custom resource, Ray should be started on
+that machine with a command like the following.
+
+```
+ray start ... --resources='{"Custom": 6}'
+```
+
+Custom resources can be used for a variety of different purposes. For example,
+the can be used to do bookkeeping of a concrete resource like memory, or to
+indicate that a particular dataset lives on a particular machine, or to give
+machines certain roles (such as a "parameter server" machine or a "worker"
+machine).
+
+## Libraries
+
+This release also includes the start of [Pandas on Ray][2], which is a project
+aimed at speeding up [Pandas][7] DataFrames. It also includes substantial
+improvements to [RLlib][3], such as high-quality implementations of algorithms
+like [Ape-X][8], and substantial improvements to [Tune][4], such as
+implementations of state of the art algorithms like [Population Based
+Training][9].
+
+[1]: https://github.com/ray-project/ray
+[2]: https://rise.cs.berkeley.edu/blog/pandas-on-ray/
+[3]: http://ray.readthedocs.io/en/latest/rllib.html
+[4]: http://ray.readthedocs.io/en/latest/tune.html
+[5]: https://rise.cs.berkeley.edu/blog/distributed-policy-optimizers-for-scalable-and-reproducible-deep-rl/
+[6]: http://ray.readthedocs.io/en/latest/resources.html
+[7]: https://pandas.pydata.org/
+[8]: https://arxiv.org/abs/1803.00933
+[9]: http://ray.readthedocs.io/en/latest/pbt.html