feat: NVIDIA Triton server Blueprint with vLLM #535

vara-bonthu · 2024-05-17T21:59:50Z

What does this PR do?

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

Summary

Created a new blueprint for deploying NVIDIA Triton server with vLLM.
This blueprint couldn't fit into the existing JARK or Trainium due to the amount of code and complexity.
It makes sense to have a dedicated pattern for NVIDIA Triton server, which will showcase both vLLM and TensorRT-LLM patterns.

Motivation

NVIDIA Triton with vLLM or TensorRT-LLM is a common pattern used by most users and customers for their production workloads due to the high performance it offers for inference workloads.

More

Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

E2E Test successfully complete before merge?

Additional Notes

ratnopamc · 2024-05-19T12:08:20Z

LGTM!

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

…iton-server-vllm

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

ratnopamc

LGTM!

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com> Co-authored-by: Ratnopam Charabarti <ratnopamc@yahoo.com>

vara-bonthu and others added 2 commits May 6, 2024 19:22

Nvidia Triton server with vLLM blueprint

bb34920

feat: Triton server vllm blueprint enhancements (#521)

3b41b10

vara-bonthu added 6 commits May 20, 2024 12:09

Added null provider version

d7849d1

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

Merge branch 'main' of https://github.com/awslabs/data-on-eks into tr…

22bfbe2

…iton-server-vllm

Corrected spelling in the docs

9a7ea7f

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

Fixed doc link

0a53e31

Added architecture diagram

5a1e743

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

Added excalidraw file for tritonserver

2ef7392

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

vara-bonthu requested review from askulkarni2, sanjeevrg89, lusoal and ratnopamc May 21, 2024 13:53

vara-bonthu added 3 commits May 30, 2024 18:29

Added multiple models

22836e2

Updated with multi model deployment approach

db07af2

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

Updates to the Triton server doc

535ed46

ratnopamc approved these changes Jun 2, 2024

View reviewed changes

renamed the folder for pattern

aa23adf

vara-bonthu merged commit bac9efb into main Jun 2, 2024
36 of 37 checks passed

vara-bonthu deleted the triton-server-vllm branch June 2, 2024 05:00

ovaleanu pushed a commit to ovaleanu/data-on-eks that referenced this pull request Aug 10, 2024

feat: NVIDIA Triton server Blueprint with vLLM (awslabs#535)

43afdd3

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com> Co-authored-by: Ratnopam Charabarti <ratnopamc@yahoo.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: NVIDIA Triton server Blueprint with vLLM #535

feat: NVIDIA Triton server Blueprint with vLLM #535

vara-bonthu commented May 17, 2024 •

edited

Loading

ratnopamc commented May 19, 2024

ratnopamc left a comment

feat: NVIDIA Triton server Blueprint with vLLM #535

feat: NVIDIA Triton server Blueprint with vLLM #535

Conversation

vara-bonthu commented May 17, 2024 • edited Loading

What does this PR do?

Motivation

More

For Moderators

Additional Notes

ratnopamc commented May 19, 2024

ratnopamc left a comment

Choose a reason for hiding this comment

vara-bonthu commented May 17, 2024 •

edited

Loading