-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Dynamic RoPE scaling #4638
Conversation
Could this get merged? Especially with Llama 3 being very tolerant of RoPE scaling this is very useful |
Sure this would be great to get in. Currently we don't have a test for it though and it might be too intensive with a real model, so I would like to see at least a unit test implemented. @sasha0552 could you add a test to |
@mgoin test added. Can you review? The failures are not related to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sasha0552! This is great. We just merged a fix for the failing tests, so please rebase and the tests should pass #4944
Head branch was pushed to by a user without write access
@mgoin can you merge? All tests passed. |
In #555, @WoosukKwon removed dynamic specifying of RoPE scaling with the comment:
I don't understand why this feature was removed, so this PR brings it back. Specifying the RoPE scaling on the command line is very useful, because otherwise we have to manually modify the
config.json
that can be managed by huggingface - so each model has to be forked to properly set a different RoPE scaling.FIX #4334
RoPE scaling allows to use higher context without further fine-tuning
Summarization using
meta-llama/Meta-Llama-3-8B-Instruct
(has a native context of 8192 tokens) withtype
=linear
andfactor
=2.0
: