-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add internlm model #528
add internlm model #528
Conversation
Add support for LLaMA-2 (vllm-project#505)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! Can you add your models to EADME.md
and docs/source/models/supported_models.rst
? Specifically, have you made sure that your implementation matches the official implementation? For example, does the greedy sampling results from this PR matches the official implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I tested internlm/internlm-chat-7b
and it works pretty well!
Hi, why I still got: vllm version: 0.1.3 script:
|
NOTE: This includes a couple importing order changes. It is because I made vllm.anyscale pakcages to go to the bottom to avoid merge conflict. Allow to build via pip install -e . Basic integration with an env var ANYSCALE_USE_SCRATCH=1 Working with Llama 7B Basic testing Batch working (but scratch only allows small number of batch now. And scratch doesn't have efficient batching yet) It is working with both scratch sampler and vllm sampler Sessions are cleaned based on LRU cache. It will be fixed in a couple weeks. support prompt logprob and some sampler features (except beam search) async execution like torch kernels llama 3 + llama 2 works Do input config validation more thorough testing Future TODO preemption not working (future work) It currently doesn't use kv cache allocated from vllm (not a strict requirement). The PR needs cleanup before merging.
No description provided.