[Serve][Doc] Split core-apis to key concepts and user guide (ray-proj…

…ect#24713)
pabloem · May 20, 2022 · 3513aa2 · 3513aa2
1 parent 5bb46ca
commit 3513aa2
Show file tree

Hide file tree

Showing 23 changed files with 676 additions and 641 deletions.
diff --git a/doc/source/_toc.yml b/doc/source/_toc.yml
@@ -121,19 +121,23 @@ parts:
         title: Ray Serve
         sections:
           - file: serve/getting_started
-          - file: serve/core-apis
-          - file: serve/http-servehandle
-          - file: serve/ml-models
-          - file: serve/deployment-graph
+          - file: serve/key-concepts
+          - file: serve/user-guide
             sections:
-              - file: serve/deployment-graph/deployment-graph-e2e-tutorial
-              - file: serve/deployment-graph/deployment-graph-user-guides
+              - file: serve/managing-deployments
+              - file: serve/handling-dependencies
+              - file: serve/http-guide
+              - file: serve/http-adapters
+              - file: serve/handle-guide
+              - file: serve/ml-models
+              - file: serve/deploying-serve
+              - file: serve/monitoring
+              - file: serve/performance
+              - file: serve/deployment-graph
                 sections:
-                   - file: serve/deployment-graph/chain_nodes_same_class_different_args
-                   - file: serve/deployment-graph/combine_two_nodes_with_passing_input_parallel
-          - file: serve/deployment
-          - file: serve/monitoring
-          - file: serve/performance
+                  - file: serve/deployment-graph/deployment-graph-e2e-tutorial
+                  - file: serve/deployment-graph/chain_nodes_same_class_different_args
+                  - file: serve/deployment-graph/combine_two_nodes_with_passing_input_parallel
           - file: serve/architecture
           - file: serve/tutorials/index
           - file: serve/faq

diff --git a/doc/source/serve/deployment.md → doc/source/serve/deploying-serve.md b/doc/source/serve/deployment.md → doc/source/serve/deploying-serve.md
@@ -4,7 +4,7 @@
 
 This section should help you:
 
-- understand how Ray Serve runs on a Ray cluster beyond the basics mentioned in {doc}`core-apis`
+- understand how Ray Serve runs on a Ray cluster beyond the basics
 - deploy and update your Serve application over time
 - monitor your Serve application using the Ray Dashboard and logging
 

diff --git a/doc/source/serve/deployment-graph.md b/doc/source/serve/deployment-graph.md
@@ -1,20 +1,17 @@
----
-jupytext:
-  formats: ipynb,md:myst
-  text_representation:
-    extension: .md
-    format_name: myst
-    format_version: 0.13
-    jupytext_version: 1.13.6
-kernelspec:
-  display_name: Python 3
-  language: python
-  name: python3
----
-
 (serve-deployment-graph)=
 
 # Deployment Graph
 
+To learn more about deployment graph in an end-to-end walkthrough:
+
 - [E2E Tutorials](./deployment-graph/deployment-graph-e2e-tutorial.md)
-- [User Guides](./deployment-graph/deployment-graph-user-guides.md)
+
+## Patterns
+
+Jump striaght into a common design patterns using deployment graph:
+
+- [Chain nodes with same class and different args](deployment-graph/chain_nodes_same_class_different_args.md)
+- [Combine two nodes with passing same input in parallel](deployment-graph/combine_two_nodes_with_passing_input_parallel.md)
+
+
+
diff --git a/doc/source/serve/deployment-graph/deployment-graph-e2e-tutorial.md b/doc/source/serve/deployment-graph/deployment-graph-e2e-tutorial.md
@@ -556,7 +556,7 @@ Total of `0.45` secs.
 
 ## More Examples using deployment graph api
 
-We provide more examples in using the deployment graph api in [here](./deployment-graph-user-guides.md)
+We provide more examples in using the deployment graph api in [here](../deployment-graph.md)
 
 ## Conclusion
 

diff --git a/doc/source/serve/deployment-graph/deployment-graph-user-guides.md b/doc/source/serve/deployment-graph/deployment-graph-user-guides.md
diff --git a/doc/source/serve/doc_code/create_deployment.py b/doc/source/serve/doc_code/create_deployment.py
@@ -93,4 +93,31 @@ def __call__(self, starlette_request) -> str:
             url = f"http://127.0.0.1:8000/{d_name}"
             print(f"handle name : {d_name}")
             print(f"prediction  : {requests.get(url, params= {'data': random()}).text}")
+
+# Output:
+# {'rep-1': Deployment(name=rep-1,version=None,route_prefix=/rep-1),
+# 'rep-2': Deployment(name=rep-2,version=None,route_prefix=/rep-2)}
+#
+# ServerHandle API responses: ----------
+# handle name : rep-1
+# prediction  : (pid: 62636); path: /model/rep-1.pkl; data: 0.600; prediction: 1.292
+# --
+# handle name : rep-2
+# prediction  : (pid: 62635); path: /model/rep-2.pkl; data: 0.075; prediction: 0.075
+# --
+# handle name : rep-1
+# prediction  : (pid: 62634); path: /model/rep-1.pkl; data: 0.186; prediction: 0.186
+# --
+# handle name : rep-2
+# prediction  : (pid: 62637); path: /model/rep-2.pkl; data: 0.751; prediction: 1.444
+# --
+# HTTP responses: ----------
+# handle name : rep-1
+# prediction  : (pid: 62636); path: /model/rep-1.pkl; data: 0.582; prediction: 1.481
+# handle name : rep-2
+# prediction  : (pid: 62637); path: /model/rep-2.pkl; data: 0.778; prediction: 1.678
+# handle name : rep-1
+# prediction  : (pid: 62634); path: /model/rep-1.pkl; data: 0.139; prediction: 0.139
+# handle name : rep-2
+# prediction  : (pid: 62635); path: /model/rep-2.pkl; data: 0.569; prediction: 1.262
 # __serve_example_end__
diff --git a/doc/source/serve/doc_code/key-concepts-deployment-graph.py b/doc/source/serve/doc_code/key-concepts-deployment-graph.py
@@ -0,0 +1,27 @@
+import ray
+from ray import serve
+from ray.serve.dag import InputNode
+from ray.serve.drivers import DAGDriver
+
+
+@serve.deployment
+def preprocess(inp: int):
+    return inp + 1
+
+
+@serve.deployment
+class Model:
+    def __init__(self, increment: int):
+        self.increment = increment
+
+    def predict(self, inp: int):
+        return inp + self.increment
+
+
+with InputNode() as inp:
+    model = Model.bind(increment=2)
+    output = model.predict.bind(preprocess.bind(inp))
+    serve_dag = DAGDriver.bind(output)
+
+handle = serve.run(serve_dag)
+assert ray.get(handle.predict.remote(1)) == 4
diff --git a/doc/source/serve/faq.md b/doc/source/serve/faq.md
@@ -10,7 +10,7 @@ questions, feel free to ask them in the [Discussion Board](https://discuss.ray.i
 
 ## How do I deploy Ray Serve?
 
-See {doc}`deployment` for information about how to deploy Serve.
+See {doc}`deploying-serve` for information about how to deploy Serve.
 
 ## How fast is Ray Serve?
 

diff --git a/doc/source/serve/getting_started.md b/doc/source/serve/getting_started.md
@@ -380,12 +380,13 @@ $ python fastapi_client.py
 ```
 
 Congratulations! You just built and deployed a machine learning model on Ray
-Serve!
+Serve! You should now have enough context to dive into the {doc}`key-concepts` to
+get a deeper understanding of Ray Serve.
 
 
 ## Next Steps
 
-- Dive into the {doc}`core-apis` to get a deeper understanding of Ray Serve.
+- Dive into the {doc}`key-concepts` to get a deeper understanding of Ray Serve.
 - Learn more about how to deploy your Ray Serve application to a multi-node cluster: {ref}`serve-deploy-tutorial`.
 - Check more in-depth tutorials for popular machine learning frameworks: {doc}`tutorials/index`.
 

diff --git a/doc/source/serve/handle-guide.md b/doc/source/serve/handle-guide.md
@@ -0,0 +1,81 @@
+(serve-handle-explainer)=
+
+# ServeHandle: Calling Deployments from Python
+
+Ray Serve enables you to query models both from HTTP and Python. This feature
+enables seamless [model composition](serve-model-composition). You can
+get a `ServeHandle` corresponding to deployment, similar how you can
+reach a deployment through HTTP via a specific route. When you issue a request
+to a deployment through `ServeHandle`, the request is load balanced across
+available replicas in the same way an HTTP request is.
+
+To call a Ray Serve deployment from python, use {mod}`Deployment.get_handle <ray.serve.api.Deployment>`
+to get a handle to the deployment, then use
+{mod}`handle.remote <ray.serve.handle.RayServeHandle.remote>` to send requests
+to that deployment. These requests can pass ordinary args and kwargs that are
+passed directly to the method. This returns a Ray `ObjectRef` whose result
+can be waited for or retrieved using `ray.wait` or `ray.get`.
+
+```python
+@serve.deployment
+class Deployment:
+    def method1(self, arg):
+        return f"Method1: {arg}"
+
+    def __call__(self, arg):
+        return f"__call__: {arg}"
+
+Deployment.deploy()
+
+handle = Deployment.get_handle()
+ray.get(handle.remote("hi")) # Defaults to calling the __call__ method.
+ray.get(handle.method1.remote("hi")) # Call a different method.
+```
+
+If you want to use the same deployment to serve both HTTP and ServeHandle traffic, the recommended best practice is to define an internal method that the HTTP handling logic will call:
+
+```python
+@serve.deployment(route_prefix="/api")
+class Deployment:
+    def say_hello(self, name: str):
+        return f"Hello {name}!"
+
+    def __call__(self, request):
+        return self.say_hello(request.query_params["name"])
+
+Deployment.deploy()
+```
+
+Now we can invoke the same logic from both HTTP or Python:
+
+```python
+print(requests.get("http://localhost:8000/api?name=Alice"))
+# Hello Alice!
+
+handle = Deployment.get_handle()
+print(ray.get(handle.say_hello.remote("Alice")))
+# Hello Alice!
+```
+
+(serve-sync-async-handles)=
+
+## Sync and Async Handles
+
+Ray Serve offers two types of `ServeHandle`. You can use the `Deployment.get_handle(..., sync=True|False)`
+flag to toggle between them.
+
+- When you set `sync=True` (the default), a synchronous handle is returned.
+  Calling `handle.remote()` should return a Ray `ObjectRef`.
+- When you set `sync=False`, an asyncio based handle is returned. You need to
+  Call it with `await handle.remote()` to return a Ray ObjectRef. To use `await`,
+  you have to run `Deployment.get_handle` and `handle.remote` in Python asyncio event loop.
+
+The async handle has performance advantage because it uses asyncio directly; as compared
+to the sync handle, which talks to an asyncio event loop in a thread. To learn more about
+the reasoning behind these, checkout our [architecture documentation](serve-architecture).
+
+## Integrating with existing web servers
+
+Ray Serve comes with its own HTTP server out of the box, but if you have an existing
+web application, you can still plug in Ray Serve to scale up your compute using the `ServeHandle`.
+For a tutorial with sample code, see {ref}`serve-web-server-integration-tutorial`.
diff --git a/doc/source/serve/handling-dependencies.md b/doc/source/serve/handling-dependencies.md
@@ -0,0 +1,32 @@
+# Handling Dependencies
+
+Ray Serve supports serving deployments with different (possibly conflicting)
+Python dependencies.  For example, you can simultaneously serve one deployment
+that uses legacy Tensorflow 1 and another that uses Tensorflow 2.
+
+This is supported on Mac OS and Linux using Ray's {ref}`runtime-environments` feature.
+As with all other Ray actor options, pass the runtime environment in via `ray_actor_options` in
+your deployment.  Be sure to first run `pip install "ray[default]"` to ensure the
+Runtime Environments feature is installed.
+
+Example:
+
+```{literalinclude} ../../../python/ray/serve/examples/doc/conda_env.py
+```
+
+:::{tip}
+Avoid dynamically installing packages that install from source: these can be slow and
+use up all resources while installing, leading to problems with the Ray cluster.  Consider
+precompiling such packages in a private repository or Docker image.
+:::
+
+The dependencies required in the deployment may be different than
+the dependencies installed in the driver program (the one running Serve API
+calls). In this case, you should use a delayed import within the class to avoid
+importing unavailable packages in the driver.  This applies even when not
+using runtime environments.
+
+Example:
+
+```{literalinclude} ../../../python/ray/serve/examples/doc/delayed_import.py
+```