-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.4.2 Release Tracker #4505
Comments
Goals:
|
Could we consider #4132? It's proven to be incredibly useful in my development process |
#4451 -> this has to be included in the next release (otherwise chunked prefill will crash when preemption is used) |
@robertgshaw2-neuralmagic for block manager V2 we still need to do profiling before we swap over. I made an issue for tracking #4537 |
@cadedaniel do you need something from the NM side on this? |
sure if there's interest :) I mention it because |
Is it possible to include #4305 ? |
Released https://github.com/vllm-project/vllm/releases/tag/v0.4.2 Notably:
|
I used the squeezeLLM 4bit to quant my model. While it seems that there is a bug. after I changed the code llama.py in Still get a gut |
ETA May 3rd, Friday.
The text was updated successfully, but these errors were encountered: