Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Upstream sync 2024 06 08 #288

Merged
merged 101 commits into from
Jun 10, 2024
Merged

Upstream sync 2024 06 08 #288

merged 101 commits into from
Jun 10, 2024

Commits on Jun 8, 2024

  1. Configuration menu
    Copy the full SHA
    e69d23b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    81ec16b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    5500975 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b913d04 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    683a30b View commit details
    Browse the repository at this point in the history
  6. [Build/CI] Enabling AMD Entrypoints Test (vllm-project#4834)

    Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    c8794c3 View commit details
    Browse the repository at this point in the history
  7. [Bugfix] Fix dummy weight for fp8 (vllm-project#4916)

    Allow dummy load format for fp8,
    torch.uniform_ doesn't support FP8 at the moment
    
    Co-authored-by: Mor Zusman <morz@ai21.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    5b6a7b5 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    a5e66c7 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    8a78ed8 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    6b46dcf View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    907d48a View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    11d6f7e View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    5d98989 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    58a235b View commit details
    Browse the repository at this point in the history
  15. [Bugfix] Fix flag name for max_seq_len_to_capture (vllm-project#4935)

    Signed-off-by: kerthcet <kerthcet@gmail.com>
    kerthcet authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    253d8fb View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    f744125 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    c1672a9 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    4b6c961 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    4b74974 View commit details
    Browse the repository at this point in the history
  20. [Kernel] Fixup for CUTLASS kernels in CUDA graphs (vllm-project#4954)

    Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs
    tlrmchlsmth authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    39c15ee View commit details
    Browse the repository at this point in the history
  21. [Misc] Load FP8 kv-cache scaling factors from checkpoints (vllm-proje…

    …ct#4893)
    
    The 2nd PR for vllm-project#4532.
    
    This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
    comaniac authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    2835fc6 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    3db99a6 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    39a0a40 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    847ca88 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    c60384c View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    dae5aaf View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    05a4f64 View commit details
    Browse the repository at this point in the history
  28. [Core][1/N] Support send/recv in PyNCCL Groups (vllm-project#4988)

    Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
    andoorve authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    bf4c411 View commit details
    Browse the repository at this point in the history
  29. [Kernel] Initial Activation Quantization Support (vllm-project#4525)

    Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    3 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    c623663 View commit details
    Browse the repository at this point in the history
  30. [Core]: Option To Use Prompt Token Ids Inside Logits Processor (vllm-…

    …project#4985)
    
    Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    a9ca32d View commit details
    Browse the repository at this point in the history
  31. [Doc] add ccache guide in doc (vllm-project#5012)

    Co-authored-by: Michael Goin <michael@neuralmagic.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    0eb33b1 View commit details
    Browse the repository at this point in the history
  32. [Kernel] Initial Activation Quantization Support (vllm-project#4525)

    Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    3 people committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    acf362c View commit details
    Browse the repository at this point in the history
  33. [Core][Bugfix]: fix prefix caching for blockv2 (vllm-project#4764)

    Co-authored-by: Lei Wen <wenlei03@qiyi.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    1226d5d View commit details
    Browse the repository at this point in the history
  34. [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3…

    …-Small model (vllm-project#4799)
    
    Co-authored-by: beagleski <yunanzhang@microsoft.com>
    Co-authored-by: bapatra <bapatra@microsoft.com>
    Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
    Co-authored-by: Michael Goin <michael@neuralmagic.com>
    5 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    29a2098 View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    3fe7e52 View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    8768b3f View commit details
    Browse the repository at this point in the history
  37. Configuration menu
    Copy the full SHA
    e7e376f View commit details
    Browse the repository at this point in the history
  38. [Bugfix / Core] Prefix Caching Guards (merged with main) (vllm-projec…

    …t#4846)
    
    Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
    Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
    3 people committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    67ce9ea View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    2c59c91 View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    9fb7b82 View commit details
    Browse the repository at this point in the history
  41. [Core] Sliding window for block manager v2 (vllm-project#4545)

    Co-authored-by: Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    954c332 View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    9929fb2 View commit details
    Browse the repository at this point in the history
  43. [Kernel][ROCm][AMD] Add fused_moe Triton configs for MI300X (vllm-pro…

    …ject#4951)
    
    This PR adds Triton kernel configs for the MoE kernel for MI300X
    divakar-amd authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    b22d985 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    54c17a9 View commit details
    Browse the repository at this point in the history
  45. [Core] Consolidate prompt arguments to LLM engines (vllm-project#4328)

    Co-authored-by: Roger Wang <ywang@roblox.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    8c9aab4 View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    705789d View commit details
    Browse the repository at this point in the history
  47. [Misc] add gpu_memory_utilization arg (vllm-project#5079)

    Signed-off-by: pandyamarut <pandyamarut@gmail.com>
    pandyamarut authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    95c2a3d View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    9175890 View commit details
    Browse the repository at this point in the history
  49. Configuration menu
    Copy the full SHA
    420c4ff View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    5bde5ba View commit details
    Browse the repository at this point in the history
  51. Configuration menu
    Copy the full SHA
    b86aa89 View commit details
    Browse the repository at this point in the history
  52. Configuration menu
    Copy the full SHA
    f63e8dd View commit details
    Browse the repository at this point in the history
  53. Configuration menu
    Copy the full SHA
    62a4fcb View commit details
    Browse the repository at this point in the history
  54. Configuration menu
    Copy the full SHA
    f900bcc View commit details
    Browse the repository at this point in the history
  55. Configuration menu
    Copy the full SHA
    6824b2f View commit details
    Browse the repository at this point in the history
  56. Configuration menu
    Copy the full SHA
    623275f View commit details
    Browse the repository at this point in the history
  57. [Bugfix / Core] Prefix Caching Guards (merged with main) (vllm-projec…

    …t#4846)
    
    Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
    Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
    3 people committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    15dcd3e View commit details
    Browse the repository at this point in the history
  58. Configuration menu
    Copy the full SHA
    5763c73 View commit details
    Browse the repository at this point in the history
  59. [CI/Build] Docker cleanup functionality for amd servers (vllm-project…

    …#5112)
    
    Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>
    Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
    Co-authored-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
    Co-authored-by: omkarkakarparthi <okakarpa>
    4 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    3a8332c View commit details
    Browse the repository at this point in the history
  60. [BUGFIX] [FRONTEND] Correct chat logprobs (vllm-project#5029)

    Co-authored-by: Breno Faria <breno.faria@intrafind.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    11a5a26 View commit details
    Browse the repository at this point in the history
  61. Configuration menu
    Copy the full SHA
    2827c68 View commit details
    Browse the repository at this point in the history
  62. Configuration menu
    Copy the full SHA
    4ae80dd View commit details
    Browse the repository at this point in the history
  63. Configuration menu
    Copy the full SHA
    886ead6 View commit details
    Browse the repository at this point in the history
  64. Configuration menu
    Copy the full SHA
    758b903 View commit details
    Browse the repository at this point in the history
  65. add doc about serving option on dstack (vllm-project#3074)

    Co-authored-by: Roger Wang <ywang@roblox.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    a190463 View commit details
    Browse the repository at this point in the history
  66. Configuration menu
    Copy the full SHA
    51cf757 View commit details
    Browse the repository at this point in the history
  67. Configuration menu
    Copy the full SHA
    c72d890 View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    cf0711b View commit details
    Browse the repository at this point in the history
  69. [Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::orde…

    …red_metadata modifier (introduced with PTX 8.5) (vllm-project#5136)
    alexm-neuralmagic authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    dcaf819 View commit details
    Browse the repository at this point in the history
  70. Configuration menu
    Copy the full SHA
    7da3c3f View commit details
    Browse the repository at this point in the history
  71. [Model] Support MAP-NEO model (vllm-project#5081)

    Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    2c66f17 View commit details
    Browse the repository at this point in the history
  72. Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using th…

    …e ::ordered_metadata modifier (introduced with PTX 8.5)" (vllm-project#5149)
    simon-mo authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    5388c64 View commit details
    Browse the repository at this point in the history
  73. [Misc]: optimize eager mode host time (vllm-project#4196)

    Co-authored-by: xuhao <xuhao@cambricon.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    5e9f300 View commit details
    Browse the repository at this point in the history
  74. Configuration menu
    Copy the full SHA
    f329e2e View commit details
    Browse the repository at this point in the history
  75. Configuration menu
    Copy the full SHA
    951e3d2 View commit details
    Browse the repository at this point in the history
  76. Configuration menu
    Copy the full SHA
    d349dbd View commit details
    Browse the repository at this point in the history
  77. format

    robertgshaw2-neuralmagic committed Jun 8, 2024
    Configuration menu
    Copy the full SHA
    031fd4e View commit details
    Browse the repository at this point in the history

Commits on Jun 9, 2024

  1. Configuration menu
    Copy the full SHA
    9ed5f76 View commit details
    Browse the repository at this point in the history
  2. fix falcon

    robertgshaw2-neuralmagic committed Jun 9, 2024
    Configuration menu
    Copy the full SHA
    ec71544 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7381340 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c23ca05 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    85512eb View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    0cea2c2 View commit details
    Browse the repository at this point in the history
  7. format

    robertgshaw2-neuralmagic committed Jun 9, 2024
    Configuration menu
    Copy the full SHA
    31147df View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    2256610 View commit details
    Browse the repository at this point in the history
  9. formatting

    robertgshaw2-neuralmagic committed Jun 9, 2024
    Configuration menu
    Copy the full SHA
    01973f5 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    a1a659d View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    c50784c View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    99fa9f8 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    2ec6643 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    0bb099c View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    198f364 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    ec0e89a View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    e6f1cbd View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    ca8d74a View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    5335ad9 View commit details
    Browse the repository at this point in the history
  20. format

    robertgshaw2-neuralmagic committed Jun 9, 2024
    Configuration menu
    Copy the full SHA
    611cfed View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    73132a5 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    7f5c715 View commit details
    Browse the repository at this point in the history

Commits on Jun 10, 2024

  1. 4 Configuration menu
    Copy the full SHA
    437912e View commit details
    Browse the repository at this point in the history
  2. 4 Configuration menu
    Copy the full SHA
    950981c View commit details
    Browse the repository at this point in the history