Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document vLLM integration as a first class citizen in Triton #88

Merged
merged 13 commits into from
Oct 20, 2023

Conversation

oandreeva-nv
Copy link
Contributor

Adding entries on vLLM python based backend to our docs.

I also thought about adding example to examples/backend folder, but based on pytorch python based backend, I think our former pytorch platform handler is more generic and makes more sense to use it as an example.

Open to suggestions.

@@ -54,6 +54,14 @@ Management](https://github.com/triton-inference-server/server/blob/main/docs/use
which allows backends to behave in a stateless manner and leave the
state management to Triton.

Triton also provides an option to create python based backends. These
backends should implement a model architecture agnostic
Copy link
Contributor

@nnshah1 nnshah1 Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backends should implement a model architecture agnostic
backends should implement the TrtionPythonModel interface as ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

examples/README.md Outdated Show resolved Hide resolved
oandreeva-nv and others added 2 commits October 13, 2023 12:54
Co-authored-by: Neelay Shah <neelays@nvidia.com>
examples/README.md Outdated Show resolved Hide resolved
examples/README.md Outdated Show resolved Hide resolved
you may find it helpful to enhance your implementation by adding ` initialize`,
`finalize`, and any other helper functions. For examples, please refer to
the [vLLM backend](https://github.com/triton-inference-server/vllm_backend),
which provides a common python script to serve models supported by vLLM.
Copy link
Contributor

@rmccorm4 rmccorm4 Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the source of truth for docs on "Python based backends" ?

If so, I think we should describe how they work in more detail wherever the source of truth is. Below are some rough ideas I think we should highlight if we don't already document it somewhere:

  • To implement a python-based backend, Triton expects to find a folder in the backends directory (ex: /opt/tritonserver/backends) with the corresponding backend name, containing a model.py that follows the [PythonBackend model format](link_to_pb_docs). For example, using a python-based backend called my_python_based_backend using the default backend directory, Triton would expect to find the full path /opt/tritonserver/backends/my_python_based_backend/model.py, and any model of this backend would define the following config.pbtxt:
# config.pbtxt
backend: "my_python_based_backend"

# any necessary I/O or other config settings
...
  • Python based backends simply load the python backend under the hood and re-use the model.py definition for any models that specify this backend.
  • You can use the python model's auto_complete_config (link here) function to define the I/O upfront so any config.pbtxt using this python-based backend isn't required to specify these settings.
  • ...

Example tritonserver output loading a model from a python-based backend:

I1013 21:52:45.756456 18668 server.cc:619]
+-------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
| Backend                 | Path                                                        | Config                                                                                                              |
+-------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
| python                  | /opt/tritonserver/backends/python/libtriton_python.so       | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability" |
|                         |                                                             | :"6.000000","default-max-batch-size":"4"}}                                                                          |
| my_python_based_backend | /opt/tritonserver/backends/my_python_based_backend/model.py | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability" |
|                         |                                                             | :"6.000000","default-max-batch-size":"4"}}                                                                          |
+-------------------------+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+

For an example of this, see the <vllm backend docs>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this level of detail on the main backend README as perhaps a new section instead of the examples. In the beginning we can add c/c++/python (with python hyperlinked to a new section)

The new section "Python based backends" would cover all the details.

Copy link
Contributor Author

@oandreeva-nv oandreeva-nv Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, working almost in this direction for the next revision. I put a new md file under docs/ and describe it there, to keep main README concise

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
examples/README.md Outdated Show resolved Hide resolved
examples/README.md Outdated Show resolved Hide resolved
examples/README.md Outdated Show resolved Hide resolved
examples/README.md Outdated Show resolved Hide resolved
to enhance your implementation by adding `initialize`, `finalize`,
and any other helper functions. Users are also encouraged to make use of the
[`auto_complete_config`](https://github.com/triton-inference-server/python_backend#auto_complete_config)
function to define standardized input and output properties upfront.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to mention which Python interpreter is going to be used and add a link to the Custom Execution environments doc in Python backend so that the user knows how to customize that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned earlier on L52

`/opt/tritonserver/backends/my_python_based_backend/` or
`/opt/tritonserver/backends/python/`.
* Specify `my_python_based_backend` as a backend in `config.pbtxt`
for any model, that should use this backend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to mention how the priority would work in case triton_python_backend_stub is present in the Python based Python backend folder (i.e., this would take priority over the stub present in the Python backend folder). Additionally, please mention the use-case that the user might want to do this (i.e., use a different Python version than what is shipped by default in the container).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! I'll certainly add this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Customization section latter in the file

tanmayv25
tanmayv25 previously approved these changes Oct 20, 2023
docs/backend_platform_support_matrix.md Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also open a PR to vLLM backend readme that link Python-based backend to this doc?

@oandreeva-nv oandreeva-nv merged commit 4ec906b into main Oct 20, 2023
1 check passed
mc-nv pushed a commit that referenced this pull request Oct 20, 2023

---------

Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
@oandreeva-nv oandreeva-nv deleted the oandreeva_pbbb_docs branch December 19, 2023 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants