Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
260b7ed
Added small section on installation when using Anaconda. Also fixed a…
Dec 11, 2019
03cf0d5
merge from upstream
Dec 11, 2019
ce5b07e
Upstream merge
Dec 13, 2019
b091027
Merge remote-tracking branch 'upstream/master'
Dec 17, 2019
66f22d3
Merge remote-tracking branch 'upstream/master'
Dec 18, 2019
3a19f54
Delete more temporary directories when running the doc "make clean".
Dec 18, 2019
580ca6e
Merge remote-tracking branch 'upstream/master'
Dec 27, 2019
1112a7b
Merge remote-tracking branch 'upstream/master'
Jan 23, 2020
bb7a25c
Merge remote-tracking branch 'upstream/master'
Jan 24, 2020
62af423
Merge remote-tracking branch 'upstream/master'
Jan 30, 2020
98ac29a
Merge remote-tracking branch 'upstream/master'
Feb 2, 2020
bbc78c7
Merge remote-tracking branch 'upstream/master'
Feb 3, 2020
95f86fc
Merge remote-tracking branch 'upstream/master'
Feb 10, 2020
7852fb5
Merge remote-tracking branch 'upstream/master'
Mar 3, 2020
c4eefb4
Merge remote-tracking branch 'upstream/master'
Mar 26, 2020
bbd96c1
Merge remote-tracking branch 'upstream/master'
Apr 10, 2020
cdd567c
Merge remote-tracking branch 'upstream/master'
May 4, 2020
bb2e570
Merge remote-tracking branch 'upstream/master'
May 8, 2020
0e1156a
Merge remote-tracking branch 'upstream/master'
May 10, 2020
e5e7eaf
Merge remote-tracking branch 'upstream/master'
May 18, 2020
ce5e2a9
Merge remote-tracking branch 'upstream/master'
May 19, 2020
e5fecfb
Merge remote-tracking branch 'upstream/master'
May 22, 2020
1c54d4c
Merge remote-tracking branch 'upstream/master'
Jun 2, 2020
7528f19
Merge remote-tracking branch 'upstream/master'
Jun 10, 2020
8d7d086
Fixed typo in LinearDiscreteEnv docs
Jun 10, 2020
080d143
merge
Jun 22, 2020
739c4e9
Merge remote-tracking branch 'upstream/master'
Jun 26, 2020
31a3540
Removed the "detached=True" from the description of detachted actors,…
Jun 26, 2020
62a1067
Merge remote-tracking branch 'upstream/master'
Jul 11, 2020
c2129eb
Merge remote-tracking branch 'upstream/master'
Jul 18, 2020
ee164fd
Merge remote-tracking branch 'upstream/master'
Jul 20, 2020
e418c29
Mostly minor refinements to the Serve docs
Jul 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 23 additions & 20 deletions doc/source/serve/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
Advanced Topics, Configurations, & FAQ
======================================

Ray Serve has a number of knobs and tools for you to tune for your particular workload.
All Ray Serve advanced options and topics are covered on this page aside from the
Ray Serve has a number of knobs and tools for you to tune for your particular workload.
All Ray Serve advanced options and topics are covered on this page aside from the
fundamentals of :doc:`deployment`. For a more hands on take, please check out the :ref:`serve-tutorials`.

There are a number of things you'll likely want to do with your serving application including
scaling out, splitting traffic, or batching input for better performance. To do all of this,
you will create a ``BackendConfig``, a configuration object that you'll use to set
you will create a ``BackendConfig``, a configuration object that you'll use to set
the properties of a particular backend.

.. contents::
Expand Down Expand Up @@ -107,7 +107,7 @@ When calling :mod:`set_traffic <ray.serve.set_traffic>`, you provide a dictionar
For example, here we split traffic 50/50 between two backends:

.. code-block:: python

serve.create_backend("backend1", MyClass1)
serve.create_backend("backend2", MyClass2)

Expand All @@ -117,28 +117,31 @@ For example, here we split traffic 50/50 between two backends:
Each request is routed randomly between the backends in the traffic dictionary according to the provided weights.
Please see :ref:`session-affinity` for details on how to ensure that clients or users are consistently mapped to the same backend.

A/B Testing
-----------
Canary Deployments
------------------

:mod:`set_traffic <ray.serve.set_traffic>` can be used to implement A/B testing by having one backend serve the majority of traffic while a fraction is routed to a second model:
:mod:`set_traffic <ray.serve.set_traffic>` can be used to implement canary deployments, where one backend serves the majority of traffic, while a small fraction is routed to a second backend. This is especially useful for "canary testing" a new model on a small percentage of users, while the tried and true old model serves the majority. Once you are satisfied with the new model, you can reroute all traffic to it and remove the old model:

.. code-block:: python

serve.create_backend("default_backend", MyClass)

# Initially, set all traffic to be served by the "default" backend.
serve.create_endpoint("ab_endpoint", backend="default_backend", route="/a-b-test")
serve.create_endpoint("canary_endpoint", backend="default_backend", route="/canary-test")

# Add a second backend and route 1% of the traffic to it.
serve.create_backend("new_backend", MyNewClass)
serve.set_traffic("ab_endpoint", {"default_backend": 0.99, "new_backend": 0.01})
serve.set_traffic("canary_endpoint", {"default_backend": 0.99, "new_backend": 0.01})

# Add a third backend that serves another 1% of the traffic.
serve.create_backend("new_backend2", MyNewClass2)
serve.set_traffic("ab_endpoint", {"default_backend": 0.98, "new_backend": 0.01, "new_backend2": 0.01})
serve.set_traffic("canary_endpoint", {"default_backend": 0.98, "new_backend": 0.01, "new_backend2": 0.01})

# Route all traffic to the new, better backend.
serve.set_traffic("canary_endpoint", {"new_backend": 1.0})

# Revert to the "default" backend serving all traffic.
serve.set_traffic("ab_endpoint", {"default_backend": 1.0})
# Or, if not so succesful, revert to the "default" backend for all traffic.
serve.set_traffic("canary_endpoint", {"default_backend": 1.0})

Incremental Rollout
-------------------
Expand All @@ -150,7 +153,7 @@ In the example below, we do this repeatedly in one script, but in practice this
.. code-block:: python

serve.create_backend("existing_backend", MyClass)

# Initially, all traffic is served by the existing backend.
serve.create_endpoint("incremental_endpoint", backend="existing_backend", route="/incremental")

Expand All @@ -160,7 +163,7 @@ In the example below, we do this repeatedly in one script, but in practice this
serve.set_traffic("incremental_endpoint", {"existing_backend": 0.8, "new_backend": 0.2})
serve.set_traffic("incremental_endpoint", {"existing_backend": 0.5, "new_backend": 0.5})
serve.set_traffic("incremental_endpoint", {"new_backend": 1.0})

# At any time, we can roll back to the existing backend.
serve.set_traffic("incremental_endpoint", {"existing_backend": 1.0})

Expand Down Expand Up @@ -196,7 +199,7 @@ This is demonstrated in the example below, where we create an endpoint serviced
.. code-block:: python

serve.create_backend("existing_backend", MyClass)

# All traffic is served by the existing backend.
serve.create_endpoint("shadowed_endpoint", backend="existing_backend", route="/shadow")

Expand All @@ -219,15 +222,15 @@ This is demonstrated in the example below, where we create an endpoint serviced

Composing Multiple Models
=========================
Ray Serve supports composing individually scalable models into a single model
out of the box. For instance, you can combine multiple models to perform
Ray Serve supports composing individually scalable models into a single model
out of the box. For instance, you can combine multiple models to perform
stacking or ensembles.

To define a higher-level composed model you need to do three things:

1. Define your underlying models (the ones that you will compose together) as
1. Define your underlying models (the ones that you will compose together) as
Ray Serve backends
2. Define your composed model, using the handles of the underlying models
2. Define your composed model, using the handles of the underlying models
(see the example below).
3. Define an endpoint representing this composed model and query it!

Expand Down
26 changes: 13 additions & 13 deletions doc/source/serve/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,22 @@ Ray Serve is a scalable model-serving library built on Ray.

For users, Ray Serve is:

- **Framework Agnostic**:Use the same toolkit to serve everything from deep learning models
built with frameworks like :ref:`PyTorch <serve-pytorch-tutorial>` or
:ref:`Tensorflow & Keras <serve-tensorflow-tutorial>` to :ref:`Scikit-Learn <serve-sklearn-tutorial>` models or arbitrary business logic.
- **Python First**: Configure your model serving with pure Python code - no more YAMLs or
- **Framework Agnostic**: Use the same toolkit to serve everything from deep learning models
built with frameworks like :ref:`PyTorch <serve-pytorch-tutorial>`,
:ref:`Tensorflow, and Keras <serve-tensorflow-tutorial>`, to :ref:`Scikit-Learn <serve-sklearn-tutorial>` models, to arbitrary business logic.
- **Python First**: Configure your model serving with pure Python code - no more YAML or
JSON configs.

As a library, Ray Serve enables:
As a library, Ray Serve enables:

- :ref:`serve-split-traffic` with zero downtime by decoupling routing logic from response handling logic.
- :ref:`serve-batching` built-in to help you meet your performance objectives or use your model for batch and online processing.
- :ref:`serve-split-traffic` with zero downtime, by decoupling routing logic from response handling logic.
- :ref:`serve-batching` is built in to help you meet your performance objectives. You can also use your model for batch and online processing.
- Because Serve is a library, it's esay to integrate it with other tools in your environment, such as CI/CD.

Since Ray is built on Ray, Ray Serve also allows you to **scale to many machines**
and allows you to leverage all of the other Ray frameworks so you can deploy and scale on any cloud.
Since Serve is built on Ray, it also allows you to scale to many machines, in your datacenter or in cloud environments, and it allows you to leverage all of the other Ray frameworks.

.. note::
If you want to try out Serve, join our `community slack <https://forms.gle/9TSdDYUgxYs8SA9e8>`_
.. note::
If you want to try out Serve, join our `community slack <https://forms.gle/9TSdDYUgxYs8SA9e8>`_
and discuss in the #serve channel.


Expand All @@ -46,7 +46,7 @@ Ray Serve supports Python versions 3.5 and higher. To install Ray Serve:
Ray Serve in 90 Seconds
=======================

Serve a function by defining a function, an endpoint, and a backend (in this case a stateless function) then
Serve a function by defining a function, an endpoint, and a backend (in this case a stateless function) then
connecting the two by setting traffic from the endpoint to the backend.

.. literalinclude:: ../../../python/ray/serve/examples/doc/quickstart_function.py
Expand Down Expand Up @@ -79,7 +79,7 @@ When should I use Ray Serve?

Ray Serve is a simple (but flexible) tool for deploying, operating, and monitoring Python based machine learning models.
Ray Serve excels when scaling out to serve models in production is a necessity. This might be because of large scale batch processing
requirements or because you're going to serve a number of models behind different endpoints and may need to run A/B tests or control
requirements or because you're going to serve a number of models behind different endpoints and may need to run A/B tests or control
traffic between different models.

If you plan on running on multiple machines, Ray Serve will serve you well.
Expand Down
10 changes: 5 additions & 5 deletions doc/source/serve/key-concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,16 @@ Backends
========

Backends define the implementation of your business logic or models that will handle requests when queries come in to :ref:`serve-endpoint`.
To define a backend, first you must define the "handler" or the business logic you'd like to respond with.
To define a backend, first you must define the "handler" or the business logic you'd like to respond with.
The handler should take as input a `Flask Request object <https://flask.palletsprojects.com/en/1.1.x/api/?highlight=request#flask.Request>`_ and return any JSON-serializable object as output.
A backend is defined using :mod:`serve.create_backend <ray.serve.create_backend>`, and the implementation can be defined as either a function or a class.
Use a function when your response is stateless and a class when you might need to maintain some state (like a model).
Use a function when your response is stateless and a class when you might need to maintain some state (like a model).
When using a class, you can specify arguments to be passed to the constructor in :mod:`serve.create_backend <ray.serve.create_backend>`, shown below.

A backend consists of a number of *replicas*, which are individual copies of the function or class that are started in separate worker processes.

.. code-block:: python

def handle_request(flask_request):
return "hello world"

Expand Down Expand Up @@ -65,7 +65,7 @@ Endpoints
=========

While backends define the implementation of your request handling logic, endpoints allow you to expose them via HTTP.
Endpoints are "logical" and can have one or multiple backends that serve requests to them
Endpoints are "logical" and can have one or multiple backends that serve requests to them.
To create an endpoint, we simply need to specify a name for the endpoint, the name of a backend to handle requests to the endpoint, and the route and methods where it will be accesible.
By default endpoints are serviced only by the backend provided to :mod:`serve.create_endpoint <ray.serve.create_endpoint>`, but in some cases you may want to specify multiple backends for an endpoint, e.g., for A/B testing or incremental rollout.
For information on how to do this, please see :ref:`serve-split-traffic`.
Expand All @@ -78,7 +78,7 @@ After creating the endpoint, it is now exposed by the HTTP server and handles re
We can query the model to verify that it's working.

.. code-block:: python

import requests
print(requests.get("http://127.0.0.1:8000/simple").text)

Expand Down