Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Upstream sync 2024 06 23 #329

Merged
merged 119 commits into from
Jun 26, 2024
Merged

Commits on Jun 23, 2024

  1. [CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs…

    … with `perf-benchmarks` label (vllm-project#5073)
    
    Co-authored-by: simon-mo <simon.mo@hey.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    5d52fa5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cab4a5d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    923d05a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    34467ee View commit details
    Browse the repository at this point in the history
  5. [ Misc ] Rs/compressed tensors cleanup (vllm-project#5432)

    Co-authored-by: mgoin <michael@neuralmagic.com>
    Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
    3 people committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    deee747 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    0ccb117 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    4464401 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    28d0d6d View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    f0e02ac View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    d0a3026 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    33edc9b View commit details
    Browse the repository at this point in the history
  12. [Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models (vllm-…

    …project#5460)
    
    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    tdoublep authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    5fffeb8 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    65419f4 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    dfd2b2e View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    d464106 View commit details
    Browse the repository at this point in the history
  16. [Core][Bugfix]: fix prefix caching for blockv2 (vllm-project#5364)

    Signed-off-by: Lei Wen <wenlei03@qiyi.com>
    Co-authored-by: Lei Wen <wenlei03@qiyi.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    80b908f View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    0393d45 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    32d5ecc View commit details
    Browse the repository at this point in the history
  19. [misc] Do not allow to use lora with chunked prefill. (vllm-project#5538

    )
    
    Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    6f3169a View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    beb3b21 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    31f38f3 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    ec68cd1 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    dc8789d View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    681de21 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    77a5f36 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    9c77244 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    f968328 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    b0abad9 View commit details
    Browse the repository at this point in the history
  29. Correct alignment in the seq_len diagram. (vllm-project#5592)

    Co-authored-by: Liqian Chen <liqian.chen@deeplang.ai>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    4b84959 View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    9cfb1d7 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    61f421b View commit details
    Browse the repository at this point in the history
  32. [Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (vllm-proj…

    …ect#3814)
    
    Co-authored-by: Jiang Li <jiang1.li@intel.com>
    Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
    Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
    4 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    dceff94 View commit details
    Browse the repository at this point in the history
  33. Configuration menu
    Copy the full SHA
    e830048 View commit details
    Browse the repository at this point in the history
  34. [CI] the readability of benchmarking and prepare for dashboard (vllm-…

    …project#5571)
    
    [CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard (vllm-project#5571)
    KuntaiDu authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    a212392 View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    bc2be04 View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    5eb3526 View commit details
    Browse the repository at this point in the history
  37. Configuration menu
    Copy the full SHA
    17fd0ba View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    7a58e54 View commit details
    Browse the repository at this point in the history
  39. [Speculative Decoding 1/2 ] Add typical acceptance sampling as one of…

    … the sampling techniques in the verifier (vllm-project#5131)
    sroy745 authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    dbf0e91 View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    18c566f View commit details
    Browse the repository at this point in the history
  41. [Kernel] Add punica dimensions for Granite 13b (vllm-project#5559)

    Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
    joerunde authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    69fa6ed View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    1b39fc2 View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    f691b45 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    5abb0c8 View commit details
    Browse the repository at this point in the history
  45. [bugfix][distributed] improve p2p capability test (vllm-project#5612)

    [bugfix][distributed] do not error if two processes do not agree on p2p capability (vllm-project#5612)
    youkaichao authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    f355997 View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    1343cd0 View commit details
    Browse the repository at this point in the history
  47. Configuration menu
    Copy the full SHA
    021cfdb View commit details
    Browse the repository at this point in the history
  48. [ci] Deprecate original CI template (vllm-project#5624)

    Signed-off-by: kevin <kevin@anyscale.com>
    khluu authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    70baf49 View commit details
    Browse the repository at this point in the history
  49. [Misc] Add OpenTelemetry support (vllm-project#4687)

    This PR adds basic support for OpenTelemetry distributed tracing.
    It includes changes to enable tracing functionality and improve monitoring capabilities.
    
    I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
    ronensc authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    be2f123 View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    0008715 View commit details
    Browse the repository at this point in the history
  51. [ci] Setup Release pipeline and build release wheels with cache (vllm…

    …-project#5610)
    
    Signed-off-by: kevin <kevin@anyscale.com>
    khluu authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    14a7620 View commit details
    Browse the repository at this point in the history
  52. Configuration menu
    Copy the full SHA
    50c2ca9 View commit details
    Browse the repository at this point in the history
  53. [Bugfix] Fix for inconsistent behaviour related to sampling and repet…

    …ition penalties (vllm-project#5639)
    
    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    tdoublep authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    3d24777 View commit details
    Browse the repository at this point in the history
  54. Configuration menu
    Copy the full SHA
    c5ef2f9 View commit details
    Browse the repository at this point in the history
  55. Configuration menu
    Copy the full SHA
    010f2e8 View commit details
    Browse the repository at this point in the history
  56. Configuration menu
    Copy the full SHA
    a8b75a4 View commit details
    Browse the repository at this point in the history
  57. Configuration menu
    Copy the full SHA
    a0d8ed2 View commit details
    Browse the repository at this point in the history
  58. [Bugfix] Added test for sampling repetition penalty bug. (vllm-projec…

    …t#5659)
    
    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    tdoublep authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    e4d2b6e View commit details
    Browse the repository at this point in the history
  59. Configuration menu
    Copy the full SHA
    cb46cfe View commit details
    Browse the repository at this point in the history
  60. Configuration menu
    Copy the full SHA
    8f72d50 View commit details
    Browse the repository at this point in the history
  61. Configuration menu
    Copy the full SHA
    b081ff9 View commit details
    Browse the repository at this point in the history
  62. Configuration menu
    Copy the full SHA
    784aa72 View commit details
    Browse the repository at this point in the history
  63. Configuration menu
    Copy the full SHA
    a799171 View commit details
    Browse the repository at this point in the history
  64. Configuration menu
    Copy the full SHA
    436aaf9 View commit details
    Browse the repository at this point in the history
  65. [ci] Add A100 queue into AWS CI template (vllm-project#5648)

    Signed-off-by: kevin <kevin@anyscale.com>
    khluu authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    d33025c View commit details
    Browse the repository at this point in the history
  66. Configuration menu
    Copy the full SHA
    cf5889f View commit details
    Browse the repository at this point in the history
  67. Configuration menu
    Copy the full SHA
    88396ae View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    8ff473a View commit details
    Browse the repository at this point in the history
  69. [Doc] Update docker references (vllm-project#5614)

    Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
    rafvasq authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    4f4cea6 View commit details
    Browse the repository at this point in the history
  70. [Misc] Add per channel support for static activation quantization; up…

    …date w8a8 schemes to share base classes (vllm-project#5650)
    dsikka authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    0e8e31e View commit details
    Browse the repository at this point in the history
  71. [ci] Limit num gpus if specified for A100 (vllm-project#5694)

    Signed-off-by: kevin <kevin@anyscale.com>
    khluu authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    1ccd388 View commit details
    Browse the repository at this point in the history
  72. Configuration menu
    Copy the full SHA
    330aa1b View commit details
    Browse the repository at this point in the history
  73. Configuration menu
    Copy the full SHA
    df3ae01 View commit details
    Browse the repository at this point in the history
  74. [Kernel] Update Cutlass int8 kernel configs for SM90 (vllm-project#5514)

    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    7d85753 View commit details
    Browse the repository at this point in the history
  75. Configuration menu
    Copy the full SHA
    b6ec1d5 View commit details
    Browse the repository at this point in the history
  76. [Kernel] Update Cutlass int8 kernel configs for SM80 (vllm-project#5275)

    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    db7892d View commit details
    Browse the repository at this point in the history
  77. Configuration menu
    Copy the full SHA
    51dfab0 View commit details
    Browse the repository at this point in the history
  78. Configuration menu
    Copy the full SHA
    c477239 View commit details
    Browse the repository at this point in the history
  79. Configuration menu
    Copy the full SHA
    5ccb86c View commit details
    Browse the repository at this point in the history
  80. [Model] MLPSpeculator speculative decoding support (vllm-project#4947)

    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    
    Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
    Co-authored-by: Nick Hill <nickhill@us.ibm.com>
    Co-authored-by: Davis Wertheimer <Davis.Wertheimer@ibm.com>
    4 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    b05443a View commit details
    Browse the repository at this point in the history
  81. Configuration menu
    Copy the full SHA
    1996acf View commit details
    Browse the repository at this point in the history
  82. Configuration menu
    Copy the full SHA
    1699d33 View commit details
    Browse the repository at this point in the history
  83. [Bugfix] Add fully sharded layer for QKVParallelLinearWithLora (vllm-…

    …project#5665)
    
    Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    e4f1a4e View commit details
    Browse the repository at this point in the history
  84. [Core][Distributed] add shm broadcast (vllm-project#5399)

    Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    3e3c8d9 View commit details
    Browse the repository at this point in the history
  85. Configuration menu
    Copy the full SHA
    01369a0 View commit details
    Browse the repository at this point in the history
  86. Configuration menu
    Copy the full SHA
    733cf30 View commit details
    Browse the repository at this point in the history
  87. Configuration menu
    Copy the full SHA
    07cd29d View commit details
    Browse the repository at this point in the history
  88. Configuration menu
    Copy the full SHA
    2e2140f View commit details
    Browse the repository at this point in the history
  89. Configuration menu
    Copy the full SHA
    0bec3f6 View commit details
    Browse the repository at this point in the history
  90. Configuration menu
    Copy the full SHA
    3595200 View commit details
    Browse the repository at this point in the history
  91. [Model] Support Qwen-VL and Qwen-VL-Chat models with text-only inputs (

    …vllm-project#5710)
    
    Co-authored-by: Roger Wang <ywang@roblox.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    1a6c6dd View commit details
    Browse the repository at this point in the history
  92. Configuration menu
    Copy the full SHA
    a7dccd6 View commit details
    Browse the repository at this point in the history
  93. Configuration menu
    Copy the full SHA
    960a022 View commit details
    Browse the repository at this point in the history
  94. Configuration menu
    Copy the full SHA
    dc211cd View commit details
    Browse the repository at this point in the history
  95. Configuration menu
    Copy the full SHA
    860a1d6 View commit details
    Browse the repository at this point in the history
  96. [BugFix] [Kernel] Add Cutlass2x fallback kernels (vllm-project#5744)

    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    2 people authored and robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    d7f0ece View commit details
    Browse the repository at this point in the history
  97. Configuration menu
    Copy the full SHA
    e484da4 View commit details
    Browse the repository at this point in the history
  98. formatted

    robertgshaw2-neuralmagic committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    683f309 View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2024

  1. fix is_xpu

    robertgshaw2-neuralmagic committed Jun 24, 2024
    Configuration menu
    Copy the full SHA
    3b3a92c View commit details
    Browse the repository at this point in the history
  2. fix lm eval

    robertgshaw2-neuralmagic committed Jun 24, 2024
    Configuration menu
    Copy the full SHA
    e0c0530 View commit details
    Browse the repository at this point in the history
  3. fix format

    robertgshaw2-neuralmagic committed Jun 24, 2024
    Configuration menu
    Copy the full SHA
    01d4f34 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    616fce8 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    3c5a7f5 View commit details
    Browse the repository at this point in the history
  6. 4 Configuration menu
    Copy the full SHA
    71f60a8 View commit details
    Browse the repository at this point in the history
  7. format

    robertgshaw2-neuralmagic committed Jun 24, 2024
    Configuration menu
    Copy the full SHA
    e960ebb View commit details
    Browse the repository at this point in the history
  8. fix lm-eval

    robertgshaw2-neuralmagic committed Jun 24, 2024
    Configuration menu
    Copy the full SHA
    3297247 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    dcdf4da View commit details
    Browse the repository at this point in the history
  10. format

    robertgshaw2-neuralmagic committed Jun 24, 2024
    Configuration menu
    Copy the full SHA
    0dd1848 View commit details
    Browse the repository at this point in the history
  11. 8 Configuration menu
    Copy the full SHA
    de06faa View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2024

  1. Configuration menu
    Copy the full SHA
    cdf52bf View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f2d2794 View commit details
    Browse the repository at this point in the history
  3. 2 Configuration menu
    Copy the full SHA
    431054d View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9d7b7b5 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    3a75e15 View commit details
    Browse the repository at this point in the history
  6. 2 Configuration menu
    Copy the full SHA
    c9d1b9e View commit details
    Browse the repository at this point in the history
  7. format

    robertgshaw2-neuralmagic committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    973d9d0 View commit details
    Browse the repository at this point in the history
  8. clean up

    robertgshaw2-neuralmagic committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    c44802e View commit details
    Browse the repository at this point in the history
  9. cleanup

    robertgshaw2-neuralmagic committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    9c15fe1 View commit details
    Browse the repository at this point in the history
  10. format

    robertgshaw2-neuralmagic committed Jun 25, 2024
    3 Configuration menu
    Copy the full SHA
    727077f View commit details
    Browse the repository at this point in the history