Skip to content

Hugging Face Transformer Deployment Tutorial #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 24, 2023

Conversation

fpetrini15
Copy link
Contributor

Tutorials to show how hugging face transformers can be quickly deployed in Triton.

@fpetrini15
Copy link
Contributor Author

All generation scripts were removed and replaced with static files. This new tutorial covers deploying falcon7b, persimmon-8b, and mistral 7b. Down the road, these models may get there own READMEs in a "Popular Models Guide" folder cc @jbkyang-nvi.

@fpetrini15 fpetrini15 requested review from nnshah1 and rmccorm4 October 3, 2023 19:36
@fpetrini15
Copy link
Contributor Author

@nnshah1. I preemptively removed Mistral from the tutorial. I can always revert if necessary.

@fpetrini15
Copy link
Contributor Author

Incorporated some feedback from Dora incorporating how to gather performance metrics, load cached models, and adding comments.

@fpetrini15
Copy link
Contributor Author

CC @nv-braf @matthewkotila in case there is any feedback regarding the PA/MA section.

@matthewkotila
Copy link
Contributor

PA stuff LGTM 👍

@tanmayv25 tanmayv25 merged commit de7da4a into main Oct 24, 2023
@fpetrini15 fpetrini15 deleted the fpetrini-hf-transformer-tutorials branch October 24, 2023 00:50
fdf3d186-88d5 pushed a commit to fdf3d186-88d5/triton-inference-server that referenced this pull request Mar 21, 2025
)

* Initial Commit

* Mount model repo so changes reflect, parameter tweaking, README file

* Image name error

* Incorporating review comments. Separate docker and model repo builds, add README, restructure repo

* Tutorial restructuring. Using static model configurations

* Bump triton container and update README

* Remove client script

* Incorporating review comments

* Modify WIP line in vLLM tutorial

* Remove trust_remote_code parameter from falcon model

* Removing Mistral

* Incorporating Feedback

* Change input/output names

* Pre-commit format

* Different perf_analyzer example, config file format fixes

* Deep dive changes to Triton tools section

* Remove unused variable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants