Skip to content

Conversation

yhtang
Copy link
Contributor

@yhtang yhtang commented Jun 18, 2025

Prevents the Gemma embedding normalizer, which is a model invariant, from being scrambled by the dummy weight loader.

This is a quick fix for the problem when a vLLM Gemma model keeps producing gibberish in an RLHF setup (similar to the RLHF example) if it was initially loaded with dummy weights, even after we send in the real weights. It is due to the dummy weight loader traversing state_dict rather than named_parameters and inadvertently randomizing the normalizer, which is a model invariant rather than a trainable weight.

A quick scan of the model registry indicates that other models register similar invariant buffers with persistent=False, thus not triggering this issue.

A long-term solution would be to provide an API for marking a buffer as invariant, e.g.

def register_invariant(self, name, tensor):
    tensor._no_dummy_init = True
    self.register_buffer(name, tensor)

and have initialize_dummy_weights respect that flag.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yhtang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request provides a targeted fix for an issue in vLLM's Gemma model implementations where the model's invariant normalizer was being corrupted by the dummy weight initialization process. By marking this specific buffer as non-persistent, the change ensures that the model can correctly load and utilize real weights after an initial dummy load, resolving a problem that caused erroneous outputs in certain configurations like RLHF setups.

Highlights

  • Gemma Model Initialization Fix: Addresses a critical issue where the Gemma model's normalizer buffer, an invariant component, was being inadvertently scrambled by the dummy weight loader. This led to incorrect model behavior (producing gibberish) even after real weights were loaded, particularly in RLHF setups.
  • Marking Invariant Buffers as Non-Persistent: Explicitly sets persistent=False when registering the normalizer buffer in the __init__ methods of GemmaModel, Gemma2Model, and Gemma3Model. This prevents the dummy weight loader from modifying this invariant buffer, ensuring correct model functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where the Gemma embedding normalizer, an invariant part of the model, was being incorrectly modified by the dummy weight loader. This led to models producing incorrect outputs after being loaded with dummy weights and then real weights in RLHF setups.

The fix involves setting persistent=False when registering the normalizer buffer in GemmaModel, Gemma2Model, and Gemma3Model. This is the correct approach because buffers with persistent=False are not included in the model's state_dict(). Since the dummy weight loader (as pointed out in the description) iterates over state_dict(), this change effectively shields the normalizer from being altered by it.

The change is consistent across all affected Gemma model files and aligns with the practice in other vLLM models for handling invariant buffers. The PR description clearly explains the problem, the rationale for the fix, and also thoughtfully proposes a more robust long-term solution for handling model invariants, which is commendable.

Overall, this is a well-targeted and correct quick fix for the described problem. The code changes are minimal and directly address the issue.

@yhtang yhtang force-pushed the yhtang/fix-gemma-normalizer branch from ec21f58 to a99ccb1 Compare June 18, 2025 07:34
@yhtang yhtang marked this pull request as ready for review June 18, 2025 07:34
@yhtang
Copy link
Contributor Author

yhtang commented Jun 18, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where Gemma models produce incorrect outputs after being loaded with dummy weights. The root cause was identified as the embedding normalizer, a model invariant, being inadvertently modified by the dummy weight loader because it was registered as a persistent buffer. The fix correctly marks this normalizer as non-persistent (persistent=False) in gemma.py, gemma2.py, and gemma3.py. This change prevents the normalizer from being included in the model's state_dict, thereby protecting it from the dummy weight loading process. This approach aligns with how other models in vLLM handle similar invariant buffers.

A new test suite, test_gemma.py, is introduced to specifically verify this fix. It ensures that the normalizer retains its correct, calculated value even when the model is loaded with load_format="dummy".

My review primarily focuses on the new test, ensuring its robustness and correctness. The core code changes are straightforward and well-justified.

Comment on lines 23 to 26
assert np.allclose(
normalizers,
llm.llm_engine.model_config.hf_config.hidden_size**0.5,
rtol=1e-3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The relative tolerance rtol=1e-3 seems a bit loose for comparing the normalizer value, which is described as a model invariant and calculated as hidden_size**0.5.

Given that the normalizer is initialized from a direct calculation (torch.tensor(self.config.hidden_size**0.5)) and should not be modified by the dummy loader (due to persistent=False), the value retrieved from the model should be very close to the re-calculated value.

Using a tighter tolerance, for example rtol=1e-5 or rtol=1e-6 (typical for float32 comparisons), would make the test more robust in verifying that the normalizer is indeed untouched and retains its precise original value. If a loose tolerance like 1e-3 is necessary for the test to pass, it might mask subtle issues or indicate unexpected floating-point discrepancies.

Consider tightening the tolerance to ensure the test accurately reflects the invariant nature of the normalizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion will fail with rtol=1e-4. This is due to the tensor using lower-precision floats such as bf16.

… being scrambled by the dummy weight loader

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
@yhtang yhtang force-pushed the yhtang/fix-gemma-normalizer branch from a99ccb1 to c390936 Compare June 18, 2025 07:38
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing, please resolve the pre-commit errors

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
@yhtang yhtang force-pushed the yhtang/fix-gemma-normalizer branch from 592d470 to 520eda9 Compare June 19, 2025 00:58
@yhtang
Copy link
Contributor Author

yhtang commented Jun 19, 2025

Thanks for fixing, please resolve the pre-commit errors

Thanks for reviewing! I've fixed the format errors, but it's now complaining again about a file that is not in this PR.

@yhtang yhtang requested a review from DarkLight1337 June 19, 2025 01:22
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) June 19, 2025 02:53
@DarkLight1337 DarkLight1337 added ready ONLY add when PR is ready to merge/full CI is needed force-merge labels Jun 19, 2025
@vllm-bot vllm-bot merged commit 83ca9ae into vllm-project:main Jun 19, 2025
68 of 71 checks passed
yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jun 22, 2025
)

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
)

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
Signed-off-by: minpeter <kali2005611@gmail.com>
gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Jun 26, 2025
)

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025
)

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025
)

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025
)

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
wwl2755-google pushed a commit to wwl2755-google/vllm that referenced this pull request Jul 1, 2025
)

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025
)

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
wpyszka pushed a commit to HabanaAI/vllm-fork that referenced this pull request Aug 1, 2025
This PR contains following changes
1. Port Gemma3 SLIDING_WINDOW FusedSDPA feature from habana_main + Add a
few extra fixes including..
- Sliding FusedSDPA kernel, we are adding threshold variable to enable
or disable to use optimized kernel. This kernel will be
performance/memory benefit for longer sequence. We are providing
environment variable to control per customer request.
- Based on the threshold, choose different prompt bucket, if it's
smaller than the threshold, use PROMPT_BUCKET_STEP, otherwise use
SLICE_SIZE.
 - Added mark_step before SLIDING FusedSDPA is run. 
 - Misc fixes for bucket related issue. 
 2. upstream fixes
 vllm-project#18732
vllm-project#21479
vllm-project#19788

3. optimized Gemma3RMSNorm with FusedRMSNorm
Dependent on #1647 


Run command with. 
VLLM_FUSEDSDPA_SLIDE_THLD=2048 VLLM_EXPONENTIAL_BUCKETING=false
VLLM_PROMPT_BS_BUCKET_MAX=64 VLLM_PROMPT_SEQ_BUCKET_STEP=1024
VLLM_PROMPT_SEQ_BUCKET_MAX=20480 PT_HPU_SDPA_QKV_SLICE_MODE_FWD=1

---------

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Hongmin Fan <fanhongmin@google.com>
Co-authored-by: Henry Tang <ytang@habana.ai>
Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai>
Co-authored-by: Shiv Kaul <skaul@habana.ai>
Co-authored-by: Shiv Kaul <shiv.kaul@intel.com>
Co-authored-by: Libin Tang <libin.tang@intel.com>
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Hongmin Fan <fanhongmin@google.com>
Co-authored-by: Harish Subramony <hsubramony@habana.ai>
Co-authored-by: Jianhong-Zhang <jianhong.zhang@intel.com>
Co-authored-by: Libin Tang <litang@habana.ai>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
tianyuan211 added a commit to tianyuan211/vllm-fork that referenced this pull request Aug 7, 2025
commit 0884eb4
Author: Jimin Ha <jimin.ha@intel.com>
Date:   Fri Aug 1 05:42:09 2025 -0700

    Gemma3 v1.22  changes (Sliding_Window feature  + few others) (HabanaAI#1660)

    This PR contains following changes
    1. Port Gemma3 SLIDING_WINDOW FusedSDPA feature from habana_main + Add a
    few extra fixes including..
    - Sliding FusedSDPA kernel, we are adding threshold variable to enable
    or disable to use optimized kernel. This kernel will be
    performance/memory benefit for longer sequence. We are providing
    environment variable to control per customer request.
    - Based on the threshold, choose different prompt bucket, if it's
    smaller than the threshold, use PROMPT_BUCKET_STEP, otherwise use
    SLICE_SIZE.
     - Added mark_step before SLIDING FusedSDPA is run.
     - Misc fixes for bucket related issue.
     2. upstream fixes
     vllm-project#18732
    vllm-project#21479
    vllm-project#19788

    3. optimized Gemma3RMSNorm with FusedRMSNorm
    Dependent on HabanaAI#1647

    Run command with.
    VLLM_FUSEDSDPA_SLIDE_THLD=2048 VLLM_EXPONENTIAL_BUCKETING=false
    VLLM_PROMPT_BS_BUCKET_MAX=64 VLLM_PROMPT_SEQ_BUCKET_STEP=1024
    VLLM_PROMPT_SEQ_BUCKET_MAX=20480 PT_HPU_SDPA_QKV_SLICE_MODE_FWD=1

    ---------

    Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
    Signed-off-by: Hongmin Fan <fanhongmin@google.com>
    Co-authored-by: Henry Tang <ytang@habana.ai>
    Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai>
    Co-authored-by: Shiv Kaul <skaul@habana.ai>
    Co-authored-by: Shiv Kaul <shiv.kaul@intel.com>
    Co-authored-by: Libin Tang <libin.tang@intel.com>
    Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
    Co-authored-by: Hongmin Fan <fanhongmin@google.com>
    Co-authored-by: Harish Subramony <hsubramony@habana.ai>
    Co-authored-by: Jianhong-Zhang <jianhong.zhang@intel.com>
    Co-authored-by: Libin Tang <litang@habana.ai>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

commit 065fde3
Author: Jan Kaniecki <jan.kaniecki@intel.com>
Date:   Thu Jul 31 15:42:13 2025 +0200

    Remove inference_mode() from platforms.hpu (HabanaAI#1690)

    Inference_mode() is causing recompilations with t.compile - we don't
    need it as we already put inference_mode on particular functions in
    model runner. It was introduced by Rebase 0.9.0.1
    (HabanaAI#1507) - previously we didn't
    have such call.

commit 7d6528e
Author: Krzysztof Smusz <ksmusz@habana.ai>
Date:   Wed Jul 30 12:19:34 2025 +0200

    Set hpu-extension to 61dafb3 (HabanaAI#1683)

    Upgrading vllm-hpu-extension with change introducing the fix for
    unsupported block_softmax_adjustment in fp16 precision

commit ff9bff9
Author: Iryna Boiko <iboiko@habana.ai>
Date:   Tue Jul 29 09:19:29 2025 +0200

    Remove dtype.float16 support for hpu config (HabanaAI#1650)

commit 034c756
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Tue Jul 29 02:17:44 2025 -0500

    [SW-234344] Fix 'RotaryEmbedding' object has no attribute 'sin' (HabanaAI#1659)

    ## Essential Elements of an Effective PR Description Checklist
    - [x] The purpose of the PR, such as "Fix some issue (link existing
    issues this PR will resolve)".
    - [ ] The test plan, such as providing test command.
    - [ ] The test results, such as pasting the results comparison before
    and after, or e2e results

    ## Purpose

    port commit from HabanaAI#1658 for fixing SW-234344 for habana_main

    ## Test Plan

    ## Test Result

    <!--- pyml disable-next-line no-emphasis-as-heading -->

    Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

commit e5a6120
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Tue Jul 29 08:53:48 2025 +0200

    1.22 Warmup one context more - linear - Update sha extension (HabanaAI#1655)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
    Co-authored-by: Jan Kaniecki <jan.kaniecki@intel.com>

commit 9957ca7
Author: Michał Kuligowski <mkuligowski@habana.ai>
Date:   Tue Jul 29 08:52:48 2025 +0200

    ValueError: 'aimv2' is already used by a Transformers config, pick an… (HabanaAI#1673)

    Fix cherrypicked from upstream
    https://github.com/vllm-project/vllm/pull/20921/files

commit f1b60b4
Author: Mohit Deopujari <mdeopujari@habana.ai>
Date:   Thu Jul 24 08:07:04 2025 -0700

    Gemma3 suppport: propogation : pr1589/1597/1558 to v1.22.0_next (HabanaAI#1616)

    Added support for FusedSDPA kernel with window_size for Gemma3.
    This PR relies on vllm-hpu-extension
    [PR302](HabanaAI/vllm-hpu-extension#302)

    ---------

    Co-authored-by: Shiv Kaul <skaul@habana.ai>
    Co-authored-by: Shiv Kaul <shiv.kaul@intel.com>
    Co-authored-by: Jimin Ha <jimin.ha@intel.com>
    Co-authored-by: Henry Tang <ytang@habana.ai>
    Co-authored-by: Libin Tang <litang@habana.ai>
    Co-authored-by: Libin Tang <libin.tang@intel.com>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

commit 59b8f75
Author: Artur Fierka <artur.fierka@intel.com>
Date:   Thu Jul 24 13:11:57 2025 +0200

    Update hpu.txt on 1.22.0 branch (HabanaAI#1648)

    Set extension SHA for Port: Fix: Round up to sliding window threshold
    HabanaAI#307 (HabanaAI#309)

commit d6b00f4
Author: Artur Fierka <artur.fierka@intel.com>
Date:   Wed Jul 23 15:50:14 2025 +0200

    [Security] Fix: Bad use of null-like value (HabanaAI#1634)

    Signed-off-by: Artur Fierka <artur.fierka@intel.com>

commit 66858d6
Author: Artur Fierka <artur.fierka@intel.com>
Date:   Wed Jul 23 15:48:53 2025 +0200

    [Security] Fix: Structurally dead code (HabanaAI#1625)

    Remove dead code for security reason

    Signed-off-by: Artur Fierka <artur.fierka@intel.com>

commit 33fbed4
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Tue Jul 22 12:49:42 2025 +0200

    Update sha - Port: Fix fallback bucket (HabanaAI#1626)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 1b46f4c
Author: Seunghyuk Park (shepark) <seunghyuk.h.park@intel.com>
Date:   Tue Jul 22 00:52:50 2025 -0700

    Embedding fix: warmup failure in embedding model (HabanaAI#1510) (HabanaAI#1559)

    Merge changes from habana_main for embedding fix
    HabanaAI#1510

    ---- details ----
    Fix the failures at warmup stage in pooling mode

    --
    due to.
    [rank0]: File "/wm/vllm-fork/vllm/worker/hpu_model_runner.py", line
    2904, in warmup_model
    [rank0]: self.warmup_graphs(
    [rank0]: File "/wm/vllm-fork/vllm/worker/hpu_model_runner.py", line
    2714, in warmup_graphs
    [rank0]: self.warmup_scenario(batch_size,
    [rank0]: File "/wm/vllm-fork/vllm/worker/hpu_model_runner.py", line
    2561, in warmup_scenario
    [rank0]: inputs = self.prepare_model_input_align_worker( [rank0]: File
    "/wm/vllm-fork/vllm/worker/model_runner_base.py", line 233, in
    prepare_model_input_align_worker
    [rank0]: raise NotImplementedError
    [rank0]: NotImplementedError

    Co-authored-by: Libin Tang <litang@habana.ai>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

commit 062f345
Author: Karol Damaszke <kdamaszke@habana.ai>
Date:   Fri Jul 18 17:02:42 2025 +0200

    Fix text-only prompt in Llama Vision (HabanaAI#1621)

    Fixes text-only prompts in Llama Vision. Without setting
    `max_encoder_seq_lens` we are not skipping `cross_attention` for
    text-only prompts, which results in None's `key` and `value`.

    Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

commit 449fa92
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Thu Jul 17 15:44:56 2025 +0200

    docker vllm: update readme (HabanaAI#1596)

    docker vllm: update readme

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit 22ee396
Author: Michal Adamczyk <michal.adamczyk@intel.com>
Date:   Thu Jul 17 09:44:10 2025 +0200

    [1.22] Set vllm-hpu-extension to 22abb7a (HabanaAI#1611)

commit 37888b5
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Thu Jul 17 07:11:00 2025 +0200

    Port: V1 - dont look for bucket we know don't exists (HabanaAI#1606) (HabanaAI#1608)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 18d51d1
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Wed Jul 16 16:29:47 2025 +0200

    Readme update - Dont use apc on v0 (HabanaAI#1607)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 9b1675c
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Wed Jul 16 13:43:59 2025 +0200

    Port: Num blocks fix - V1 (HabanaAI#1594) (HabanaAI#1601)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit bdd9171
Author: Yi Liu <yi4.liu@intel.com>
Date:   Tue Jul 15 18:43:49 2025 +0800

    Update Force Channel FP8 Check (HabanaAI#1563)

    Porting HabanaAI#1561

    Signed-off-by: yiliu30 <yi4.liu@intel.com>

commit 23e63c0
Author: liuzhenwei <zhenwei.liu@intel.com>
Date:   Tue Jul 15 16:06:19 2025 +0800

    [V0] Use device as the set_device's parameter by default, update proxy (HabanaAI#1582)

    https://jira.habana-labs.com/browse/SW-234257
    cherry-pick from HabanaAI#1540

    Signed-off-by: zhenwei <zhenweiliu@habana.ai>
    Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

commit 82fc060
Author: Iryna Boiko <iboiko@habana.ai>
Date:   Mon Jul 14 15:58:24 2025 +0200

    Change vllm-hpu-extension revision to 89515f6 (HabanaAI#1584)

    Change vllm-hpu-extension revision to 89515f6

commit 47768d3
Author: Iryna Boiko <iboiko@habana.ai>
Date:   Mon Jul 14 15:18:30 2025 +0200

    Port: temporarely disable deepseek test HabanaAI#1535 (HabanaAI#1586)

    Port: Update hpu-ext sha and temporarely disable deepseek test HabanaAI#1535

commit f1c70dc
Author: Michał Kuligowski <mkuligowski@habana.ai>
Date:   Mon Jul 14 14:57:57 2025 +0200

    Fix AttributeError: 'NoneType' object has no attribute 'getenv' (HabanaAI#1555)

    Fixes
    AttributeError: 'NoneType' object has no attribute 'getenv'
    during tests teardown

commit 617498a
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Mon Jul 14 14:35:07 2025 +0200

    Readme warmup update (HabanaAI#1512) (HabanaAI#1585)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 8bb429d
Author: Tomasz Pawlowski <tpawlowski@habana.ai>
Date:   Fri Jul 11 20:21:57 2025 +0200

    Add accelerate to requirements/hpu.txt (HabanaAI#1564) (v1.22.0) (HabanaAI#1566)

    Cherry picked from HabanaAI#1564

    Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>

commit aca2ddc
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Fri Jul 11 12:58:11 2025 +0200

    docker vllm: add server config for model Qwen/Qwen2.5-VL-7B-Instruct (HabanaAI#1569)

    docker vllm: add server config for model Qwen/Qwen2.5-VL-7B-Instruct

    ---------

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit 512caed
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Thu Jul 10 08:12:39 2025 +0200

    docker vllm: cleanup configs and add missing models (HabanaAI#1548)

    docker vllm: cleanup configs and add missing models

    ---------

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit 7b69f70
Author: PatW <patryk.wolsza@intel.com>
Date:   Tue Jul 8 13:56:23 2025 +0200

    Cherrypick docker vllm: update readme (HabanaAI#1525) (HabanaAI#1538)

    Cherry pick of the docker vllm: update readme from habana_main

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>
    Signed-off-by: Artur Fierka <artur.fierka@intel.com>
    Co-authored-by: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>

commit 79ef0d5
Author: Michal Szutenberg <michal.szutenberg@intel.com>
Date:   Tue Jul 8 12:39:00 2025 +0200

    [SW-234006] Fix requirements (1.22.0) (HabanaAI#1530)

    See
    https://jira.habana-labs.com/browse/SW-234006?focusedId=1073396&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-1073396
tianyuan211 added a commit to tianyuan211/vllm-fork that referenced this pull request Aug 14, 2025
commit 95f5008
Author: Wei Lin <forever871001@163.com>
Date:   Wed Aug 13 20:46:59 2025 +0800

    Porting DeeSeek v2/r1 PRs (HabanaAI#1756)

    ## Essential Elements of an Effective PR Description Checklist
    - [ ] The purpose of the PR, such as "Fix some issue (link existing
    issues this PR will resolve)".
    - [ ] The test plan, such as providing test command.
    - [ ] The test results, such as pasting the results comparison before
    and after, or e2e results

    ## Porting List
    1. HabanaAI#1402
    2. HabanaAI#1504
    3. HabanaAI#1404

    <!--- pyml disable-next-line no-emphasis-as-heading -->

commit fd41376
Author: Bob Zhu <bob.zhu@intel.com>
Date:   Wed Aug 13 16:21:20 2025 +0800

    link to the correct vllm-hpu-extention branch (HabanaAI#1755)

    The vllm-fork aice/v1.22.0 branch will always use vllm-hpu-extention
    aice/v1.22.0 branch

commit 6693645
Author: Wei Lin <forever871001@163.com>
Date:   Wed Aug 13 15:19:09 2025 +0800

    Add profiler for HPU (HabanaAI#1753)

    ## Essential Elements of an Effective PR Description Checklist
    - [ ] The purpose of the PR, such as "Fix some issue (link existing
    issues this PR will resolve)".
    - [ ] The test plan, such as providing test command.
    - [ ] The test results, such as pasting the results comparison before
    and after, or e2e results

    ## Purpose

    ## Test Plan

    ## Test Result

    <!--- pyml disable-next-line no-emphasis-as-heading -->

commit 26d4308
Author: Katarzyna Fojcik <katarzyna.fojcik@intel.com>
Date:   Tue Aug 12 15:06:01 2025 +0200

    [SW-234805] Fix target_device for weights load (HabanaAI#1734)

    ## Essential Elements of an Effective PR Description Checklist
    - [x] The purpose of the PR, such as "Fix some issue (link existing
    issues this PR will resolve)".
    - [ ] The test plan, such as providing test command.
    - [ ] The test results, such as pasting the results comparison before
    and after, or e2e results

    ## Purpose
    With
    HabanaAI@4d5ee6c
    Rebase 0.9.0.1 (HabanaAI#1507) the
    loading way has changed. Target_device during loading model and weights
    should depend on load_device config as it was before, so the flag
    --weights-load-device=cpu is functional.

    ## Test Plan
    One of the affected tests: llama-31-70b-fp8-1x-gaudi3

    ## Test Result
    PASSED:
    https://qa-jenkins-ctrl03.habana-labs.com/job/qa_jobs/job/qa_testers/job/gdn-qa/job/pytorch/job/gaudi3/job/continous_batching/job/VLLM/job/Native/job/llama-31-70b-fp8-1x-gaudi3-native_benchmark_throughput_VLLM_pytorch_gaudi3-ank8s_v1_22_0/108/

    <!--- pyml disable-next-line no-emphasis-as-heading -->

    Co-authored-by: Wojciech Pyszka <wpyszka@habana.ai>
    Co-authored-by: Michal Gawarkiewicz <michal.gawarkiewicz@intel.com>

commit 4118669
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Tue Aug 12 13:25:15 2025 +0200

    Fix merged prefill with new bucketing manager (HabanaAI#1746)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
    Signed-off-by: root <root@adobrzyniewicz-sn76-g3-mpijob-worker-0.adobrzyniewicz-sn76-g3-mpijob-worker.framework.svc.cluster.local>
    Co-authored-by: root <root@adobrzyniewicz-sn76-g3-mpijob-worker-0.adobrzyniewicz-sn76-g3-mpijob-worker.framework.svc.cluster.local>

commit d837abf
Author: PatW <patryk.wolsza@intel.com>
Date:   Tue Aug 12 11:35:37 2025 +0200

    1.22 release readme change (HabanaAI#1718)

    Collection of changes in documentation for 1.22 release.

commit cb44d6f
Author: Jimin Ha <jimin.ha@intel.com>
Date:   Mon Aug 11 21:03:50 2025 -0700

    Fixes HPU graph run for Gemma3 vision inputs (HabanaAI#1719)

    Fixes HPU graph issues for gemma3 vision inputs

    - Text warmup to include attn_mask info, so vision+text data can reuse
    the graph for language model that's warmed up already.
    - Changing slicing to index_select for multimodal bucketing for HPU.
    Slicing doesn't produce the same hash for the HPU graph with same input
    shape.
    -    Use buckets for the vision tower as well to reduce GC recompile
    - Accuracy bug fix by clone output data of the multimodal-projector.

    Validated with Muirbench datasets.

commit 2d550e4
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Thu Aug 7 07:16:45 2025 -0500

    skip softmax/log_softmax when greedy_sampling with no logprobs (HabanaAI#1706)

    ## Essential Elements of an Effective PR Description Checklist
    - [ ] The purpose of the PR, such as "Fix some issue (link existing
    issues this PR will resolve)".
    - [ ] The test plan, such as providing test command.
    - [ ] The test results, such as pasting the results comparison before
    and after, or e2e results

    ## Purpose

    If seq_groups with all sampling_type == Greedy, avoid
    log_softmax/softmax during sampling calculation

    ## Test Plan

    ## Test Result

    <!--- pyml disable-next-line no-emphasis-as-heading -->

    ---------

    Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
    Co-authored-by: attafosu <tattafosu@habana.ai>

commit 4008f53
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Tue Aug 5 14:14:35 2025 +0200

    docker vllm: add VLLM_EXPONENTIAL_BUCKETING param (HabanaAI#1682)

    docker vllm: add VLLM_EXPONENTIAL_BUCKETING param

    ---------

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit a12463c
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Mon Aug 4 03:09:56 2025 -0500

    [SW-235047] port PR 1629 to 1.22.0 - use w8a8 path for per_channel for performance regression fixing (HabanaAI#1644)

    HabanaAI#1629

    ---------

    Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
    Co-authored-by: Jan Kaniecki <jan.kaniecki@intel.com>

commit 0884eb4
Author: Jimin Ha <jimin.ha@intel.com>
Date:   Fri Aug 1 05:42:09 2025 -0700

    Gemma3 v1.22  changes (Sliding_Window feature  + few others) (HabanaAI#1660)

    This PR contains following changes
    1. Port Gemma3 SLIDING_WINDOW FusedSDPA feature from habana_main + Add a
    few extra fixes including..
    - Sliding FusedSDPA kernel, we are adding threshold variable to enable
    or disable to use optimized kernel. This kernel will be
    performance/memory benefit for longer sequence. We are providing
    environment variable to control per customer request.
    - Based on the threshold, choose different prompt bucket, if it's
    smaller than the threshold, use PROMPT_BUCKET_STEP, otherwise use
    SLICE_SIZE.
     - Added mark_step before SLIDING FusedSDPA is run.
     - Misc fixes for bucket related issue.
     2. upstream fixes
     vllm-project#18732
    vllm-project#21479
    vllm-project#19788

    3. optimized Gemma3RMSNorm with FusedRMSNorm
    Dependent on HabanaAI#1647

    Run command with.
    VLLM_FUSEDSDPA_SLIDE_THLD=2048 VLLM_EXPONENTIAL_BUCKETING=false
    VLLM_PROMPT_BS_BUCKET_MAX=64 VLLM_PROMPT_SEQ_BUCKET_STEP=1024
    VLLM_PROMPT_SEQ_BUCKET_MAX=20480 PT_HPU_SDPA_QKV_SLICE_MODE_FWD=1

    ---------

    Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
    Signed-off-by: Hongmin Fan <fanhongmin@google.com>
    Co-authored-by: Henry Tang <ytang@habana.ai>
    Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai>
    Co-authored-by: Shiv Kaul <skaul@habana.ai>
    Co-authored-by: Shiv Kaul <shiv.kaul@intel.com>
    Co-authored-by: Libin Tang <libin.tang@intel.com>
    Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
    Co-authored-by: Hongmin Fan <fanhongmin@google.com>
    Co-authored-by: Harish Subramony <hsubramony@habana.ai>
    Co-authored-by: Jianhong-Zhang <jianhong.zhang@intel.com>
    Co-authored-by: Libin Tang <litang@habana.ai>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

commit 065fde3
Author: Jan Kaniecki <jan.kaniecki@intel.com>
Date:   Thu Jul 31 15:42:13 2025 +0200

    Remove inference_mode() from platforms.hpu (HabanaAI#1690)

    Inference_mode() is causing recompilations with t.compile - we don't
    need it as we already put inference_mode on particular functions in
    model runner. It was introduced by Rebase 0.9.0.1
    (HabanaAI#1507) - previously we didn't
    have such call.

commit 7d6528e
Author: Krzysztof Smusz <ksmusz@habana.ai>
Date:   Wed Jul 30 12:19:34 2025 +0200

    Set hpu-extension to 61dafb3 (HabanaAI#1683)

    Upgrading vllm-hpu-extension with change introducing the fix for
    unsupported block_softmax_adjustment in fp16 precision

commit ff9bff9
Author: Iryna Boiko <iboiko@habana.ai>
Date:   Tue Jul 29 09:19:29 2025 +0200

    Remove dtype.float16 support for hpu config (HabanaAI#1650)

commit 034c756
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Tue Jul 29 02:17:44 2025 -0500

    [SW-234344] Fix 'RotaryEmbedding' object has no attribute 'sin' (HabanaAI#1659)

    ## Essential Elements of an Effective PR Description Checklist
    - [x] The purpose of the PR, such as "Fix some issue (link existing
    issues this PR will resolve)".
    - [ ] The test plan, such as providing test command.
    - [ ] The test results, such as pasting the results comparison before
    and after, or e2e results

    ## Purpose

    port commit from HabanaAI#1658 for fixing SW-234344 for habana_main

    ## Test Plan

    ## Test Result

    <!--- pyml disable-next-line no-emphasis-as-heading -->

    Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

commit e5a6120
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Tue Jul 29 08:53:48 2025 +0200

    1.22 Warmup one context more - linear - Update sha extension (HabanaAI#1655)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
    Co-authored-by: Jan Kaniecki <jan.kaniecki@intel.com>

commit 9957ca7
Author: Michał Kuligowski <mkuligowski@habana.ai>
Date:   Tue Jul 29 08:52:48 2025 +0200

    ValueError: 'aimv2' is already used by a Transformers config, pick an… (HabanaAI#1673)

    Fix cherrypicked from upstream
    https://github.com/vllm-project/vllm/pull/20921/files

commit f1b60b4
Author: Mohit Deopujari <mdeopujari@habana.ai>
Date:   Thu Jul 24 08:07:04 2025 -0700

    Gemma3 suppport: propogation : pr1589/1597/1558 to v1.22.0_next (HabanaAI#1616)

    Added support for FusedSDPA kernel with window_size for Gemma3.
    This PR relies on vllm-hpu-extension
    [PR302](HabanaAI/vllm-hpu-extension#302)

    ---------

    Co-authored-by: Shiv Kaul <skaul@habana.ai>
    Co-authored-by: Shiv Kaul <shiv.kaul@intel.com>
    Co-authored-by: Jimin Ha <jimin.ha@intel.com>
    Co-authored-by: Henry Tang <ytang@habana.ai>
    Co-authored-by: Libin Tang <litang@habana.ai>
    Co-authored-by: Libin Tang <libin.tang@intel.com>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

commit 59b8f75
Author: Artur Fierka <artur.fierka@intel.com>
Date:   Thu Jul 24 13:11:57 2025 +0200

    Update hpu.txt on 1.22.0 branch (HabanaAI#1648)

    Set extension SHA for Port: Fix: Round up to sliding window threshold
    HabanaAI#307 (HabanaAI#309)

commit d6b00f4
Author: Artur Fierka <artur.fierka@intel.com>
Date:   Wed Jul 23 15:50:14 2025 +0200

    [Security] Fix: Bad use of null-like value (HabanaAI#1634)

    Signed-off-by: Artur Fierka <artur.fierka@intel.com>

commit 66858d6
Author: Artur Fierka <artur.fierka@intel.com>
Date:   Wed Jul 23 15:48:53 2025 +0200

    [Security] Fix: Structurally dead code (HabanaAI#1625)

    Remove dead code for security reason

    Signed-off-by: Artur Fierka <artur.fierka@intel.com>

commit 33fbed4
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Tue Jul 22 12:49:42 2025 +0200

    Update sha - Port: Fix fallback bucket (HabanaAI#1626)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 1b46f4c
Author: Seunghyuk Park (shepark) <seunghyuk.h.park@intel.com>
Date:   Tue Jul 22 00:52:50 2025 -0700

    Embedding fix: warmup failure in embedding model (HabanaAI#1510) (HabanaAI#1559)

    Merge changes from habana_main for embedding fix
    HabanaAI#1510

    ---- details ----
    Fix the failures at warmup stage in pooling mode

    --
    due to.
    [rank0]: File "/wm/vllm-fork/vllm/worker/hpu_model_runner.py", line
    2904, in warmup_model
    [rank0]: self.warmup_graphs(
    [rank0]: File "/wm/vllm-fork/vllm/worker/hpu_model_runner.py", line
    2714, in warmup_graphs
    [rank0]: self.warmup_scenario(batch_size,
    [rank0]: File "/wm/vllm-fork/vllm/worker/hpu_model_runner.py", line
    2561, in warmup_scenario
    [rank0]: inputs = self.prepare_model_input_align_worker( [rank0]: File
    "/wm/vllm-fork/vllm/worker/model_runner_base.py", line 233, in
    prepare_model_input_align_worker
    [rank0]: raise NotImplementedError
    [rank0]: NotImplementedError

    Co-authored-by: Libin Tang <litang@habana.ai>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

commit 062f345
Author: Karol Damaszke <kdamaszke@habana.ai>
Date:   Fri Jul 18 17:02:42 2025 +0200

    Fix text-only prompt in Llama Vision (HabanaAI#1621)

    Fixes text-only prompts in Llama Vision. Without setting
    `max_encoder_seq_lens` we are not skipping `cross_attention` for
    text-only prompts, which results in None's `key` and `value`.

    Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

commit 449fa92
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Thu Jul 17 15:44:56 2025 +0200

    docker vllm: update readme (HabanaAI#1596)

    docker vllm: update readme

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit 22ee396
Author: Michal Adamczyk <michal.adamczyk@intel.com>
Date:   Thu Jul 17 09:44:10 2025 +0200

    [1.22] Set vllm-hpu-extension to 22abb7a (HabanaAI#1611)

commit 37888b5
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Thu Jul 17 07:11:00 2025 +0200

    Port: V1 - dont look for bucket we know don't exists (HabanaAI#1606) (HabanaAI#1608)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 18d51d1
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Wed Jul 16 16:29:47 2025 +0200

    Readme update - Dont use apc on v0 (HabanaAI#1607)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 9b1675c
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Wed Jul 16 13:43:59 2025 +0200

    Port: Num blocks fix - V1 (HabanaAI#1594) (HabanaAI#1601)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit bdd9171
Author: Yi Liu <yi4.liu@intel.com>
Date:   Tue Jul 15 18:43:49 2025 +0800

    Update Force Channel FP8 Check (HabanaAI#1563)

    Porting HabanaAI#1561

    Signed-off-by: yiliu30 <yi4.liu@intel.com>

commit 23e63c0
Author: liuzhenwei <zhenwei.liu@intel.com>
Date:   Tue Jul 15 16:06:19 2025 +0800

    [V0] Use device as the set_device's parameter by default, update proxy (HabanaAI#1582)

    https://jira.habana-labs.com/browse/SW-234257
    cherry-pick from HabanaAI#1540

    Signed-off-by: zhenwei <zhenweiliu@habana.ai>
    Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

commit 82fc060
Author: Iryna Boiko <iboiko@habana.ai>
Date:   Mon Jul 14 15:58:24 2025 +0200

    Change vllm-hpu-extension revision to 89515f6 (HabanaAI#1584)

    Change vllm-hpu-extension revision to 89515f6

commit 47768d3
Author: Iryna Boiko <iboiko@habana.ai>
Date:   Mon Jul 14 15:18:30 2025 +0200

    Port: temporarely disable deepseek test HabanaAI#1535 (HabanaAI#1586)

    Port: Update hpu-ext sha and temporarely disable deepseek test HabanaAI#1535

commit f1c70dc
Author: Michał Kuligowski <mkuligowski@habana.ai>
Date:   Mon Jul 14 14:57:57 2025 +0200

    Fix AttributeError: 'NoneType' object has no attribute 'getenv' (HabanaAI#1555)

    Fixes
    AttributeError: 'NoneType' object has no attribute 'getenv'
    during tests teardown

commit 617498a
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Mon Jul 14 14:35:07 2025 +0200

    Readme warmup update (HabanaAI#1512) (HabanaAI#1585)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 8bb429d
Author: Tomasz Pawlowski <tpawlowski@habana.ai>
Date:   Fri Jul 11 20:21:57 2025 +0200

    Add accelerate to requirements/hpu.txt (HabanaAI#1564) (v1.22.0) (HabanaAI#1566)

    Cherry picked from HabanaAI#1564

    Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>

commit aca2ddc
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Fri Jul 11 12:58:11 2025 +0200

    docker vllm: add server config for model Qwen/Qwen2.5-VL-7B-Instruct (HabanaAI#1569)

    docker vllm: add server config for model Qwen/Qwen2.5-VL-7B-Instruct

    ---------

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit 512caed
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Thu Jul 10 08:12:39 2025 +0200

    docker vllm: cleanup configs and add missing models (HabanaAI#1548)

    docker vllm: cleanup configs and add missing models

    ---------

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit 7b69f70
Author: PatW <patryk.wolsza@intel.com>
Date:   Tue Jul 8 13:56:23 2025 +0200

    Cherrypick docker vllm: update readme (HabanaAI#1525) (HabanaAI#1538)

    Cherry pick of the docker vllm: update readme from habana_main

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>
    Signed-off-by: Artur Fierka <artur.fierka@intel.com>
    Co-authored-by: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>

commit 79ef0d5
Author: Michal Szutenberg <michal.szutenberg@intel.com>
Date:   Tue Jul 8 12:39:00 2025 +0200

    [SW-234006] Fix requirements (1.22.0) (HabanaAI#1530)

    See
    https://jira.habana-labs.com/browse/SW-234006?focusedId=1073396&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-1073396
googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025
)

Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
force-merge ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants