ray-project · simon-mo · Jul 24, 2020 · Dec 11, 2019 · Dec 11, 2019 · Dec 13, 2019
@@ -2,13 +2,13 @@
 Advanced Topics, Configurations, & FAQ
 ======================================
 
-Ray Serve has a number of knobs and tools for you to tune for your particular workload. 
-All Ray Serve advanced options and topics are covered on this page aside from the 
+Ray Serve has a number of knobs and tools for you to tune for your particular workload.
+All Ray Serve advanced options and topics are covered on this page aside from the
 fundamentals of :doc:`deployment`. For a more hands on take, please check out the :ref:`serve-tutorials`.
 
 There are a number of things you'll likely want to do with your serving application including
 scaling out, splitting traffic, or batching input for better performance. To do all of this,
-you will create a ``BackendConfig``, a configuration object that you'll use to set 
+you will create a ``BackendConfig``, a configuration object that you'll use to set
 the properties of a particular backend.
 
 .. contents::
@@ -107,7 +107,7 @@ When calling :mod:`set_traffic <ray.serve.set_traffic>`, you provide a dictionar
 For example, here we split traffic 50/50 between two backends:
 
 .. code-block:: python
-  
+
   serve.create_backend("backend1", MyClass1)
   serve.create_backend("backend2", MyClass2)
 
@@ -117,28 +117,31 @@ For example, here we split traffic 50/50 between two backends:
 Each request is routed randomly between the backends in the traffic dictionary according to the provided weights.
 Please see :ref:`session-affinity` for details on how to ensure that clients or users are consistently mapped to the same backend.
 
-A/B Testing
------------
+Canary Deployments
+------------------
 
-:mod:`set_traffic <ray.serve.set_traffic>` can be used to implement A/B testing by having one backend serve the majority of traffic while a fraction is routed to a second model:
+:mod:`set_traffic <ray.serve.set_traffic>` can be used to implement canary deployments, where one backend serves the majority of traffic, while a small fraction is routed to a second backend. This is especially useful for "canary testing" a new model on a small percentage of users, while the tried and true old model serves the majority. Once you are satisfied with the new model, you can reroute all traffic to it and remove the old model:
 
 .. code-block:: python
 
   serve.create_backend("default_backend", MyClass)
-  
+
   # Initially, set all traffic to be served by the "default" backend.
-  serve.create_endpoint("ab_endpoint", backend="default_backend", route="/a-b-test")
+  serve.create_endpoint("canary_endpoint", backend="default_backend", route="/canary-test")
 
   # Add a second backend and route 1% of the traffic to it.
   serve.create_backend("new_backend", MyNewClass)
-  serve.set_traffic("ab_endpoint", {"default_backend": 0.99, "new_backend": 0.01})
+  serve.set_traffic("canary_endpoint", {"default_backend": 0.99, "new_backend": 0.01})
 
   # Add a third backend that serves another 1% of the traffic.
   serve.create_backend("new_backend2", MyNewClass2)
-  serve.set_traffic("ab_endpoint", {"default_backend": 0.98, "new_backend": 0.01, "new_backend2": 0.01})
+  serve.set_traffic("canary_endpoint", {"default_backend": 0.98, "new_backend": 0.01, "new_backend2": 0.01})
+
+  # Route all traffic to the new, better backend.
+  serve.set_traffic("canary_endpoint", {"new_backend": 1.0})
 
-  # Revert to the "default" backend serving all traffic.
-  serve.set_traffic("ab_endpoint", {"default_backend": 1.0})
+  # Or, if not so succesful, revert to the "default" backend for all traffic.
+  serve.set_traffic("canary_endpoint", {"default_backend": 1.0})
 
 Incremental Rollout
 -------------------
@@ -150,7 +153,7 @@ In the example below, we do this repeatedly in one script, but in practice this
 .. code-block:: python
 
   serve.create_backend("existing_backend", MyClass)
-  
+
   # Initially, all traffic is served by the existing backend.
   serve.create_endpoint("incremental_endpoint", backend="existing_backend", route="/incremental")
 
@@ -160,7 +163,7 @@ In the example below, we do this repeatedly in one script, but in practice this
   serve.set_traffic("incremental_endpoint", {"existing_backend": 0.8, "new_backend": 0.2})
   serve.set_traffic("incremental_endpoint", {"existing_backend": 0.5, "new_backend": 0.5})
   serve.set_traffic("incremental_endpoint", {"new_backend": 1.0})
-  
+
   # At any time, we can roll back to the existing backend.
   serve.set_traffic("incremental_endpoint", {"existing_backend": 1.0})
 
@@ -196,7 +199,7 @@ This is demonstrated in the example below, where we create an endpoint serviced
 .. code-block:: python
 
   serve.create_backend("existing_backend", MyClass)
-  
+
   # All traffic is served by the existing backend.
   serve.create_endpoint("shadowed_endpoint", backend="existing_backend", route="/shadow")
 
@@ -219,15 +222,15 @@ This is demonstrated in the example below, where we create an endpoint serviced
 
 Composing Multiple Models
 =========================
-Ray Serve supports composing individually scalable models into a single model 
-out of the box. For instance, you can combine multiple models to perform 
+Ray Serve supports composing individually scalable models into a single model
+out of the box. For instance, you can combine multiple models to perform
 stacking or ensembles.
 
 To define a higher-level composed model you need to do three things:
 
-1. Define your underlying models (the ones that you will compose together) as 
+1. Define your underlying models (the ones that you will compose together) as
    Ray Serve backends
-2. Define your composed model, using the handles of the underlying models 
+2. Define your composed model, using the handles of the underlying models
    (see the example below).
 3. Define an endpoint representing this composed model and query it!
 

@@ -15,22 +15,22 @@ Ray Serve is a scalable model-serving library built on Ray.
 
 For users, Ray Serve is:
 
-- **Framework Agnostic**:Use the same toolkit to serve everything from deep learning models 
-  built with frameworks like :ref:`PyTorch <serve-pytorch-tutorial>` or 
-  :ref:`Tensorflow & Keras <serve-tensorflow-tutorial>` to :ref:`Scikit-Learn <serve-sklearn-tutorial>` models or arbitrary business logic.
-- **Python First**: Configure your model serving with pure Python code - no more YAMLs or 
+- **Framework Agnostic**: Use the same toolkit to serve everything from deep learning models
+  built with frameworks like :ref:`PyTorch <serve-pytorch-tutorial>`,
+  :ref:`Tensorflow, and Keras <serve-tensorflow-tutorial>`, to :ref:`Scikit-Learn <serve-sklearn-tutorial>` models, to arbitrary business logic.
+- **Python First**: Configure your model serving with pure Python code - no more YAML or
   JSON configs.
 
-As a library, Ray Serve enables: 
+As a library, Ray Serve enables:
 
-- :ref:`serve-split-traffic` with zero downtime by decoupling routing logic from response handling logic.
-- :ref:`serve-batching` built-in to help you meet your performance objectives or use your model for batch and online processing.
+- :ref:`serve-split-traffic` with zero downtime, by decoupling routing logic from response handling logic.
+- :ref:`serve-batching` is built in to help you meet your performance objectives. You can also use your model for batch and online processing.
+- Because Serve is a library, it's esay to integrate it with other tools in your environment, such as CI/CD.
 
-Since Ray is built on Ray, Ray Serve also allows you to **scale to many machines**
-and allows you to leverage all of the other Ray frameworks so you can deploy and scale on any cloud.
+Since Serve is built on Ray, it also allows you to scale to many machines, in your datacenter or in cloud environments, and it allows you to leverage all of the other Ray frameworks.
 
-.. note:: 
-  If you want to try out Serve, join our `community slack <https://forms.gle/9TSdDYUgxYs8SA9e8>`_ 
+.. note::
+  If you want to try out Serve, join our `community slack <https://forms.gle/9TSdDYUgxYs8SA9e8>`_
   and discuss in the #serve channel.
 
 
@@ -46,7 +46,7 @@ Ray Serve supports Python versions 3.5 and higher. To install Ray Serve:
 Ray Serve in 90 Seconds
 =======================
 
-Serve a function by defining a function, an endpoint, and a backend (in this case a stateless function) then 
+Serve a function by defining a function, an endpoint, and a backend (in this case a stateless function) then
 connecting the two by setting traffic from the endpoint to the backend.
 
 .. literalinclude:: ../../../python/ray/serve/examples/doc/quickstart_function.py
@@ -79,7 +79,7 @@ When should I use Ray Serve?
 
 Ray Serve is a simple (but flexible) tool for deploying, operating, and monitoring Python based machine learning models.
 Ray Serve excels when scaling out to serve models in production is a necessity. This might be because of large scale batch processing
-requirements or because you're going to serve a number of models behind different endpoints and may need to run A/B tests or control 
+requirements or because you're going to serve a number of models behind different endpoints and may need to run A/B tests or control
 traffic between different models.
 
 If you plan on running on multiple machines, Ray Serve will serve you well.

@@ -17,16 +17,16 @@ Backends
 ========
 
 Backends define the implementation of your business logic or models that will handle requests when queries come in to :ref:`serve-endpoint`.
-To define a backend, first you must define the "handler" or the business logic you'd like to respond with. 
+To define a backend, first you must define the "handler" or the business logic you'd like to respond with.
 The handler should take as input a `Flask Request object <https://flask.palletsprojects.com/en/1.1.x/api/?highlight=request#flask.Request>`_ and return any JSON-serializable object as output.
 A backend is defined using :mod:`serve.create_backend <ray.serve.create_backend>`, and the implementation can be defined as either a function or a class.
-Use a function when your response is stateless and a class when you might need to maintain some state (like a model). 
+Use a function when your response is stateless and a class when you might need to maintain some state (like a model).
 When using a class, you can specify arguments to be passed to the constructor in :mod:`serve.create_backend <ray.serve.create_backend>`, shown below.
 
 A backend consists of a number of *replicas*, which are individual copies of the function or class that are started in separate worker processes.
 
 .. code-block:: python
-  
+
   def handle_request(flask_request):
     return "hello world"
 
@@ -65,7 +65,7 @@ Endpoints
 =========
 
 While backends define the implementation of your request handling logic, endpoints allow you to expose them via HTTP.
-Endpoints are "logical" and can have one or multiple backends that serve requests to them
+Endpoints are "logical" and can have one or multiple backends that serve requests to them.
 To create an endpoint, we simply need to specify a name for the endpoint, the name of a backend to handle requests to the endpoint, and the route and methods where it will be accesible.
 By default endpoints are serviced only by the backend provided to :mod:`serve.create_endpoint <ray.serve.create_endpoint>`, but in some cases you may want to specify multiple backends for an endpoint, e.g., for A/B testing or incremental rollout.
 For information on how to do this, please see :ref:`serve-split-traffic`.
@@ -78,7 +78,7 @@ After creating the endpoint, it is now exposed by the HTTP server and handles re
 We can query the model to verify that it's working.
 
 .. code-block:: python
-  
+
   import requests
   print(requests.get("http://127.0.0.1:8000/simple").text)