Skip to content

Conversation

yiliu30
Copy link

@yiliu30 yiliu30 commented Jul 10, 2025

Porting #1561

ekagra-ranjan and others added 30 commits June 3, 2025 15:26
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
…hen Thinking is enabled (vllm-project#19075)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: nicklucche <nlucches@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
…for models with sliding window layers (vllm-project#19029)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
…project#18678)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Deepseek in our definition has two places where kv_b_proj is defined: in
`self_attn.kv_b_proj` and `self_attn.impl.kv_b_proj` . First one isn't
used, but at the model initialization is present, which makes inc try to
quantize it. Because at the measurement it wasn't used, there are no
measurements for this specific object and it causes it to crash.

---------

Signed-off-by: kwisniewski98 <kwisniewski@habana.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
michalkuligowski and others added 20 commits July 4, 2025 14:28
Twin PR: HabanaAI/vllm-hpu-extension#223

---------

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: root <root@adobrzyniewicz-t28p-g3-mpijob-worker-0.adobrzyniewicz-t28p-g3-mpijob-worker.framework.svc.cluster.local>
Signed-off-by: root <root@adobrzyniewicz-6hqu-g2-mpijob-worker-0.adobrzyniewicz-6hqu-g2-mpijob-worker.framework.svc.cluster.local>
Signed-off-by: root <root@adobrzyniewicz-fbbo-g2-mpijob-worker-0.adobrzyniewicz-fbbo-g2-mpijob-worker.framework.svc.cluster.local>
Co-authored-by: root <root@adobrzyniewicz-t28p-g3-mpijob-worker-0.adobrzyniewicz-t28p-g3-mpijob-worker.framework.svc.cluster.local>
Co-authored-by: root <root@adobrzyniewicz-6hqu-g2-mpijob-worker-0.adobrzyniewicz-6hqu-g2-mpijob-worker.framework.svc.cluster.local>
Co-authored-by: root <root@adobrzyniewicz-fbbo-g2-mpijob-worker-0.adobrzyniewicz-fbbo-g2-mpijob-worker.framework.svc.cluster.local>
Signed-off-by: kwisniewski98 <kwisniewski@habana.ai>
Cherry pick of the docker vllm: update readme from habana_main

Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>
Signed-off-by: Artur Fierka <artur.fierka@intel.com>
Co-authored-by: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.