Skip to content

Commit

Permalink
[Serve][Doc] Split core-apis to key concepts and user guide (ray-proj…
Browse files Browse the repository at this point in the history
  • Loading branch information
simon-mo authored May 20, 2022
1 parent 5bb46ca commit 3513aa2
Show file tree
Hide file tree
Showing 23 changed files with 676 additions and 641 deletions.
26 changes: 15 additions & 11 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,19 +121,23 @@ parts:
title: Ray Serve
sections:
- file: serve/getting_started
- file: serve/core-apis
- file: serve/http-servehandle
- file: serve/ml-models
- file: serve/deployment-graph
- file: serve/key-concepts
- file: serve/user-guide
sections:
- file: serve/deployment-graph/deployment-graph-e2e-tutorial
- file: serve/deployment-graph/deployment-graph-user-guides
- file: serve/managing-deployments
- file: serve/handling-dependencies
- file: serve/http-guide
- file: serve/http-adapters
- file: serve/handle-guide
- file: serve/ml-models
- file: serve/deploying-serve
- file: serve/monitoring
- file: serve/performance
- file: serve/deployment-graph
sections:
- file: serve/deployment-graph/chain_nodes_same_class_different_args
- file: serve/deployment-graph/combine_two_nodes_with_passing_input_parallel
- file: serve/deployment
- file: serve/monitoring
- file: serve/performance
- file: serve/deployment-graph/deployment-graph-e2e-tutorial
- file: serve/deployment-graph/chain_nodes_same_class_different_args
- file: serve/deployment-graph/combine_two_nodes_with_passing_input_parallel
- file: serve/architecture
- file: serve/tutorials/index
- file: serve/faq
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

This section should help you:

- understand how Ray Serve runs on a Ray cluster beyond the basics mentioned in {doc}`core-apis`
- understand how Ray Serve runs on a Ray cluster beyond the basics
- deploy and update your Serve application over time
- monitor your Serve application using the Ray Dashboard and logging

Expand Down
27 changes: 12 additions & 15 deletions doc/source/serve/deployment-graph.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,17 @@
---
jupytext:
formats: ipynb,md:myst
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.13.6
kernelspec:
display_name: Python 3
language: python
name: python3
---

(serve-deployment-graph)=

# Deployment Graph

To learn more about deployment graph in an end-to-end walkthrough:

- [E2E Tutorials](./deployment-graph/deployment-graph-e2e-tutorial.md)
- [User Guides](./deployment-graph/deployment-graph-user-guides.md)

## Patterns

Jump striaght into a common design patterns using deployment graph:

- [Chain nodes with same class and different args](deployment-graph/chain_nodes_same_class_different_args.md)
- [Combine two nodes with passing same input in parallel](deployment-graph/combine_two_nodes_with_passing_input_parallel.md)



Original file line number Diff line number Diff line change
Expand Up @@ -556,7 +556,7 @@ Total of `0.45` secs.

## More Examples using deployment graph api

We provide more examples in using the deployment graph api in [here](./deployment-graph-user-guides.md)
We provide more examples in using the deployment graph api in [here](../deployment-graph.md)

## Conclusion

Expand Down
24 changes: 0 additions & 24 deletions doc/source/serve/deployment-graph/deployment-graph-user-guides.md

This file was deleted.

27 changes: 27 additions & 0 deletions doc/source/serve/doc_code/create_deployment.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,4 +93,31 @@ def __call__(self, starlette_request) -> str:
url = f"http://127.0.0.1:8000/{d_name}"
print(f"handle name : {d_name}")
print(f"prediction : {requests.get(url, params= {'data': random()}).text}")

# Output:
# {'rep-1': Deployment(name=rep-1,version=None,route_prefix=/rep-1),
# 'rep-2': Deployment(name=rep-2,version=None,route_prefix=/rep-2)}
#
# ServerHandle API responses: ----------
# handle name : rep-1
# prediction : (pid: 62636); path: /model/rep-1.pkl; data: 0.600; prediction: 1.292
# --
# handle name : rep-2
# prediction : (pid: 62635); path: /model/rep-2.pkl; data: 0.075; prediction: 0.075
# --
# handle name : rep-1
# prediction : (pid: 62634); path: /model/rep-1.pkl; data: 0.186; prediction: 0.186
# --
# handle name : rep-2
# prediction : (pid: 62637); path: /model/rep-2.pkl; data: 0.751; prediction: 1.444
# --
# HTTP responses: ----------
# handle name : rep-1
# prediction : (pid: 62636); path: /model/rep-1.pkl; data: 0.582; prediction: 1.481
# handle name : rep-2
# prediction : (pid: 62637); path: /model/rep-2.pkl; data: 0.778; prediction: 1.678
# handle name : rep-1
# prediction : (pid: 62634); path: /model/rep-1.pkl; data: 0.139; prediction: 0.139
# handle name : rep-2
# prediction : (pid: 62635); path: /model/rep-2.pkl; data: 0.569; prediction: 1.262
# __serve_example_end__
27 changes: 27 additions & 0 deletions doc/source/serve/doc_code/key-concepts-deployment-graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import ray
from ray import serve
from ray.serve.dag import InputNode
from ray.serve.drivers import DAGDriver


@serve.deployment
def preprocess(inp: int):
return inp + 1


@serve.deployment
class Model:
def __init__(self, increment: int):
self.increment = increment

def predict(self, inp: int):
return inp + self.increment


with InputNode() as inp:
model = Model.bind(increment=2)
output = model.predict.bind(preprocess.bind(inp))
serve_dag = DAGDriver.bind(output)

handle = serve.run(serve_dag)
assert ray.get(handle.predict.remote(1)) == 4
2 changes: 1 addition & 1 deletion doc/source/serve/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ questions, feel free to ask them in the [Discussion Board](https://discuss.ray.i

## How do I deploy Ray Serve?

See {doc}`deployment` for information about how to deploy Serve.
See {doc}`deploying-serve` for information about how to deploy Serve.

## How fast is Ray Serve?

Expand Down
5 changes: 3 additions & 2 deletions doc/source/serve/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,12 +380,13 @@ $ python fastapi_client.py
```

Congratulations! You just built and deployed a machine learning model on Ray
Serve!
Serve! You should now have enough context to dive into the {doc}`key-concepts` to
get a deeper understanding of Ray Serve.


## Next Steps

- Dive into the {doc}`core-apis` to get a deeper understanding of Ray Serve.
- Dive into the {doc}`key-concepts` to get a deeper understanding of Ray Serve.
- Learn more about how to deploy your Ray Serve application to a multi-node cluster: {ref}`serve-deploy-tutorial`.
- Check more in-depth tutorials for popular machine learning frameworks: {doc}`tutorials/index`.

Expand Down
81 changes: 81 additions & 0 deletions doc/source/serve/handle-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
(serve-handle-explainer)=

# ServeHandle: Calling Deployments from Python

Ray Serve enables you to query models both from HTTP and Python. This feature
enables seamless [model composition](serve-model-composition). You can
get a `ServeHandle` corresponding to deployment, similar how you can
reach a deployment through HTTP via a specific route. When you issue a request
to a deployment through `ServeHandle`, the request is load balanced across
available replicas in the same way an HTTP request is.

To call a Ray Serve deployment from python, use {mod}`Deployment.get_handle <ray.serve.api.Deployment>`
to get a handle to the deployment, then use
{mod}`handle.remote <ray.serve.handle.RayServeHandle.remote>` to send requests
to that deployment. These requests can pass ordinary args and kwargs that are
passed directly to the method. This returns a Ray `ObjectRef` whose result
can be waited for or retrieved using `ray.wait` or `ray.get`.

```python
@serve.deployment
class Deployment:
def method1(self, arg):
return f"Method1: {arg}"

def __call__(self, arg):
return f"__call__: {arg}"

Deployment.deploy()

handle = Deployment.get_handle()
ray.get(handle.remote("hi")) # Defaults to calling the __call__ method.
ray.get(handle.method1.remote("hi")) # Call a different method.
```

If you want to use the same deployment to serve both HTTP and ServeHandle traffic, the recommended best practice is to define an internal method that the HTTP handling logic will call:

```python
@serve.deployment(route_prefix="/api")
class Deployment:
def say_hello(self, name: str):
return f"Hello {name}!"

def __call__(self, request):
return self.say_hello(request.query_params["name"])

Deployment.deploy()
```

Now we can invoke the same logic from both HTTP or Python:

```python
print(requests.get("http://localhost:8000/api?name=Alice"))
# Hello Alice!

handle = Deployment.get_handle()
print(ray.get(handle.say_hello.remote("Alice")))
# Hello Alice!
```

(serve-sync-async-handles)=

## Sync and Async Handles

Ray Serve offers two types of `ServeHandle`. You can use the `Deployment.get_handle(..., sync=True|False)`
flag to toggle between them.

- When you set `sync=True` (the default), a synchronous handle is returned.
Calling `handle.remote()` should return a Ray `ObjectRef`.
- When you set `sync=False`, an asyncio based handle is returned. You need to
Call it with `await handle.remote()` to return a Ray ObjectRef. To use `await`,
you have to run `Deployment.get_handle` and `handle.remote` in Python asyncio event loop.

The async handle has performance advantage because it uses asyncio directly; as compared
to the sync handle, which talks to an asyncio event loop in a thread. To learn more about
the reasoning behind these, checkout our [architecture documentation](serve-architecture).

## Integrating with existing web servers

Ray Serve comes with its own HTTP server out of the box, but if you have an existing
web application, you can still plug in Ray Serve to scale up your compute using the `ServeHandle`.
For a tutorial with sample code, see {ref}`serve-web-server-integration-tutorial`.
32 changes: 32 additions & 0 deletions doc/source/serve/handling-dependencies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Handling Dependencies

Ray Serve supports serving deployments with different (possibly conflicting)
Python dependencies. For example, you can simultaneously serve one deployment
that uses legacy Tensorflow 1 and another that uses Tensorflow 2.

This is supported on Mac OS and Linux using Ray's {ref}`runtime-environments` feature.
As with all other Ray actor options, pass the runtime environment in via `ray_actor_options` in
your deployment. Be sure to first run `pip install "ray[default]"` to ensure the
Runtime Environments feature is installed.

Example:

```{literalinclude} ../../../python/ray/serve/examples/doc/conda_env.py
```

:::{tip}
Avoid dynamically installing packages that install from source: these can be slow and
use up all resources while installing, leading to problems with the Ray cluster. Consider
precompiling such packages in a private repository or Docker image.
:::

The dependencies required in the deployment may be different than
the dependencies installed in the driver program (the one running Serve API
calls). In this case, you should use a delayed import within the class to avoid
importing unavailable packages in the driver. This applies even when not
using runtime environments.

Example:

```{literalinclude} ../../../python/ray/serve/examples/doc/delayed_import.py
```
Loading

0 comments on commit 3513aa2

Please sign in to comment.