feat: Llama2 inf2 Ray inference upgrade and bug fix #540

vara-bonthu · 2024-05-27T07:26:14Z

What does this PR do?

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

This PR includes the following updates and fixes for the Llama2 inference setup using Ray on Inf2 instances:

Upgraded Ray Inference Setup:

Added LIB path to the Docker image path for Ray.
Updated the Ray Serve YAML configuration.
Fixed the environment variable setup for neuron cores in the serving script.
Revised the documentation with accurate deployment steps.
Added a Gradio Web UI app for Llama2 deployment, making it generic for use with any models by creating the model client as a config map. This approach eliminates the need to create a dedicated image for each model inference, avoiding local Docker image builds.

Motivation

More

Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

E2E Test successfully complete before merge?

Additional Notes

askulkarni2

LGTM! Can you please add .py and .yaml files to the ignore list for spell check in the .pre-commit.yaml?

vara-bonthu · 2024-05-28T20:12:57Z

LGTM! Can you please add .py and .yaml files to the ignore list for spell check in the .pre-commit.yaml?

I am thinking about a way to exclude only the code and still validate the comments under.pyand .yaml files.

vara-bonthu added 2 commits May 27, 2024 00:22

Llama2 inf2 inference upgrade and bug fix

316b4b2

Updated Gradio Ui deployment for the model

d23e53c

vara-bonthu requested review from ratnopamc and askulkarni2 May 27, 2024 16:36

askulkarni2 approved these changes May 28, 2024

View reviewed changes

vara-bonthu merged commit 61b688f into main May 28, 2024
37 checks passed

vara-bonthu deleted the llama-inf2-fix branch May 28, 2024 20:13

ovaleanu pushed a commit to ovaleanu/data-on-eks that referenced this pull request Aug 10, 2024

feat: Llama2 inf2 Ray inference upgrade and bug fix (awslabs#540)

6a8e1ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Llama2 inf2 Ray inference upgrade and bug fix #540

feat: Llama2 inf2 Ray inference upgrade and bug fix #540

vara-bonthu commented May 27, 2024 •

edited

Loading

askulkarni2 left a comment

vara-bonthu commented May 28, 2024

feat: Llama2 inf2 Ray inference upgrade and bug fix #540

feat: Llama2 inf2 Ray inference upgrade and bug fix #540

Conversation

vara-bonthu commented May 27, 2024 • edited Loading

What does this PR do?

Motivation

More

For Moderators

Additional Notes

askulkarni2 left a comment

Choose a reason for hiding this comment

vara-bonthu commented May 28, 2024

vara-bonthu commented May 27, 2024 •

edited

Loading