-
Notifications
You must be signed in to change notification settings - Fork 118
Hugging Face Transformer Deployment Tutorial #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Quick_Deploy/HuggingFaceTransformers/base_text_classification_model.py
Outdated
Show resolved
Hide resolved
Quick_Deploy/HuggingFaceTransformers/base_text_classification_model.py
Outdated
Show resolved
Hide resolved
Quick_Deploy/HuggingFaceTransformers/base_text_generation_model.py
Outdated
Show resolved
Hide resolved
… add README, restructure repo
Quick_Deploy/HuggingFaceTransformers/text_generation/config.pbtxt
Outdated
Show resolved
Hide resolved
Quick_Deploy/HuggingFaceTransformers/text_generation/config.pbtxt
Outdated
Show resolved
Hide resolved
All generation scripts were removed and replaced with static files. This new tutorial covers deploying falcon7b, persimmon-8b, and mistral 7b. Down the road, these models may get there own READMEs in a "Popular Models Guide" folder cc @jbkyang-nvi. |
@nnshah1. I preemptively removed Mistral from the tutorial. I can always revert if necessary. |
Incorporated some feedback from Dora incorporating how to gather performance metrics, load cached models, and adding comments. |
CC @nv-braf @matthewkotila in case there is any feedback regarding the PA/MA section. |
PA stuff LGTM 👍 |
) * Initial Commit * Mount model repo so changes reflect, parameter tweaking, README file * Image name error * Incorporating review comments. Separate docker and model repo builds, add README, restructure repo * Tutorial restructuring. Using static model configurations * Bump triton container and update README * Remove client script * Incorporating review comments * Modify WIP line in vLLM tutorial * Remove trust_remote_code parameter from falcon model * Removing Mistral * Incorporating Feedback * Change input/output names * Pre-commit format * Different perf_analyzer example, config file format fixes * Deep dive changes to Triton tools section * Remove unused variable
Tutorials to show how hugging face transformers can be quickly deployed in Triton.