-
Notifications
You must be signed in to change notification settings - Fork 97
Document vLLM integration as a first class citizen in Triton #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
97f181b
acb637c
ec8faf4
c8ae649
7cc8685
3b2c3ef
ffb1d0a
7f4a561
6903893
fdd9617
3ff2274
0f05065
d8a7324
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also open a PR to vLLM backend readme that link Python-based backend to this doc? |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
<!-- | ||
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
# | ||
# Redistribution and use in source and binary forms, with or without | ||
# modification, are permitted provided that the following conditions | ||
# are met: | ||
# * Redistributions of source code must retain the above copyright | ||
# notice, this list of conditions and the following disclaimer. | ||
# * Redistributions in binary form must reproduce the above copyright | ||
# notice, this list of conditions and the following disclaimer in the | ||
# documentation and/or other materials provided with the distribution. | ||
# * Neither the name of NVIDIA CORPORATION nor the names of its | ||
# contributors may be used to endorse or promote products derived | ||
# from this software without specific prior written permission. | ||
# | ||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY | ||
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR | ||
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR | ||
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, | ||
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, | ||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR | ||
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY | ||
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
--> | ||
|
||
# Python-based Backends | ||
|
||
Python-based backend is a special type of Triton's backends, which does | ||
not require any C++ code. However, this type of backends depends on | ||
[Python backend](https://github.com/triton-inference-server/python_backend) | ||
and requires the following artifacts being present: | ||
`libtriton_python.so`, `triton_python_backend_stub`, | ||
and `triton_python_backend_utils.py`. | ||
|
||
## Usage | ||
To implement and use a Python-based backend, make sure to follow these steps. | ||
* Implement the | ||
[`TritonPythonModel` interface](https://github.com/triton-inference-server/python_backend#usage), | ||
which could be re-used as a backend by multiple models. | ||
This script should be named `model.py`. | ||
* Create a folder for your custom backend under the backends directory | ||
(ex: /opt/tritonserver/backends) with the corresponding backend name, | ||
containing the `model.py`. For example, for a backend named | ||
`my_python_based_backend`, Triton would expect to find the full path | ||
`/opt/tritonserver/backends/my_python_based_backend/model.py`. | ||
* Make sure that `libtriton_python.so`, `triton_python_backend_stub`, | ||
and `triton_python_backend_utils.py` are present either under | ||
`/opt/tritonserver/backends/my_python_based_backend/` or | ||
`/opt/tritonserver/backends/python/`. When both locations contain | ||
mentioned artifacts, custom backend's artifacts will take priority over Python | ||
backend's artifacts. This way, if custom backends needs to use a different | ||
Python version than what is shipped by default, it can easily be done. Please, | ||
refer to [customization](#customization) section for more details. | ||
* Specify `my_python_based_backend` as a backend in `config.pbtxt` | ||
for any model, that should use this backend. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be good to mention how the priority would work in case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing this out! I'll certainly add this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added |
||
|
||
``` | ||
... | ||
backend: "my_python_based_backend" | ||
... | ||
``` | ||
|
||
Since Triton uses Python backend under the hood, it is expected, | ||
to see `python` backend entry in server logs, even when Python backend | ||
is not explicitly used. | ||
|
||
``` | ||
I1013 21:52:45.756456 18668 server.cc:619] | ||
+-------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ | ||
| Backend | Path | Config | | ||
+-------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ | ||
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability" | | ||
| | | :"6.000000","default-max-batch-size":"4"}} | | ||
| my_python_based_backend | /opt/tritonserver/backends/my_python_based_backend/model.py | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability" | | ||
| | | :"6.000000","default-max-batch-size":"4"}} | | ||
+-------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ | ||
``` | ||
|
||
## Customization | ||
|
||
Python backend shipped in the NVIDIA GPU Cloud containers uses Python 3.10. | ||
Python backend is able to use the libraries that exist in the | ||
current Python environment. These libraries can be installed in a virtualenv, | ||
conda environment, or the global system Python, and | ||
will only be used if the Python version matches the Python version | ||
of the Python backend's stub executable (`triton_python_backend_stub`). | ||
For example, if you install a set of libraries in a Python 3.9 environment | ||
and your Python backend stub is compiled with Python 3.10 these libraries | ||
will *NOT* be available. You would need to | ||
[compile](https://github.com/triton-inference-server/python_backend#building-custom-python-backend-stub) | ||
the stub executable with Python 3.9. | ||
|
||
If you want to create a tar file that contains all your Python dependencies | ||
or you want to use different Python environments for each Python model | ||
you need to create a | ||
[Custom Execution Environment](https://github.com/triton-inference-server/python_backend#creating-custom-execution-environments) | ||
in Python backend. | ||
|
||
## Background | ||
|
||
In some use cases, it is sufficient to implement | ||
[`TritonPythonModel` interface](https://github.com/triton-inference-server/python_backend#usage) | ||
only once and re-use it across multiple models. As an example, please refer | ||
to the [vLLM backend](https://github.com/triton-inference-server/vllm_backend), | ||
which provides a common python script to serve models supported by vLLM. | ||
|
||
Triton Inference Server can handle this special case and treats common | ||
`model.py` script as a Python-based backend. In the scenario, when model | ||
relies on a custom Python-based backend, Triton loads `libtriton_python.so` | ||
first, this ensures that Triton knows how to send requests to the backend | ||
for execution and the backend knows how to communicate with Triton. Then, | ||
Triton makes sure to use common `model.py` from the backend's repository, | ||
and not look for it in the model repository. | ||
|
||
While the only required function is `execute`, it is typically helpful | ||
to enhance your implementation by adding `initialize`, `finalize`, | ||
and any other helper functions. Users are also encouraged to make use of the | ||
[`auto_complete_config`](https://github.com/triton-inference-server/python_backend#auto_complete_config) | ||
function to define standardized input and output properties upfront. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be good to mention which Python interpreter is going to be used and add a link to the Custom Execution environments doc in Python backend so that the user knows how to customize that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mentioned earlier on L52 |
Uh oh!
There was an error while loading. Please reload this page.