Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Upstream sync 2024 06 08 #288

Merged
merged 101 commits into from
Jun 10, 2024
Merged

Upstream sync 2024 06 08 #288

merged 101 commits into from
Jun 10, 2024

Conversation

robertgshaw2-neuralmagic
Copy link
Collaborator

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic commented Jun 8, 2024

Upstream sync 2024 06 08 (#288) - ties to v0.4.3 of vllm-upstream

SUMMARY:

  • Merge commits from vllm-project@f68470e to vllm-project@1197e02
  • Our GCP test instances do not have gcc or clang installed. All of the triton kernels rely on the gcc and clang to generate JITs. I disabled these for now, but we need to get these installed (cc @andy-neuma). All are marked with:
@pytest.mark.skip("C compiler not installed in NM automation. "
                  "This codepath follows a triton pathway, which "
                  "JITs using clang or gcc. Since neither are installed "
                  "in our test instances, we need to skip this for now.")
  • Cherry-picked in the changes associated with Fp8 weight format from @mgoin

Note that vllm-project@f68470e is NOT included in this merge.

COMPARE vs UPSTREAM:

alexm-neuralmagic and others added 30 commits June 8, 2024 16:39
Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>
Allow dummy load format for fp8,
torch.uniform_ doesn't support FP8 at the moment

Co-authored-by: Mor Zusman <morz@ai21.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs
…ct#4893)

The 2nd PR for vllm-project#4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
…project#4985)

Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>
@andy-neuma andy-neuma self-requested a review June 10, 2024 17:25
Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@andy-neuma andy-neuma merged commit db9ed90 into main Jun 10, 2024
49 of 57 checks passed
robertgshaw2-neuralmagic added a commit that referenced this pull request Jun 11, 2024
Upstream sync 2024 06 11
(#288)

SUMMARY:

* Merge commits from
vllm-project@1197e02
to
vllm-project@114332b
* Our GCP test instances do not have gcc or clang installed. All of the
triton kernels rely on the gcc and clang to generate JITs. These are
still disabled (cc @andy-neuma). All are marked with:
```python 
@pytest.mark.skip("C compiler not installed in NM automation. "
                  "This codepath follows a triton pathway, which "
                  "JITs using clang or gcc. Since neither are installed "
                  "in our test instances, we need to skip this for now.")
```

Note that
vllm-project@1197e02
is NOT included in this merge.

COMPARE vs UPSTREAM:


https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-06-11..vllm-project:vllm:v0.5.0

---------

Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
Signed-off-by: kevin <kevin@anyscale.com>
Co-authored-by: Daniele <d.trifiro@me.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Ye Cao <952129620@qq.com>
Co-authored-by: Nadav Shmayovits <45605409+NadavShmayo@users.noreply.github.com>
Co-authored-by: chenqianfzh <51831990+chenqianfzh@users.noreply.github.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Daniil Arapov <59310708+Delviet@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Avinash Raj <avistylein3105@gmail.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Yuan <yuan.zhou@intel.com>
Co-authored-by: Kaiyang Chen <48289729+Kaiyang-Chen@users.noreply.github.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Breno Faria <breno@veltefaria.de>
Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: zifeitong <zifei.tong@parasail.io>
Co-authored-by: Jie Fu (傅杰) <fujie_email@sina.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: DriverSong <31926998+DriverSong@users.noreply.github.com>
Co-authored-by: qiujiawei9 <qiujiawei9@jd.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Alex Wu <alexanderwu@berkeley.edu>
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
Co-authored-by: liuyhwangyh <liuyhwangyh@163.com>
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
Co-authored-by: Matthew Goldey <matthew.goldey@gmail.com>
Co-authored-by: Jie Fu (傅杰) <jiefu@tencent.com>
Co-authored-by: Itay Etelis <92247226+Etelis@users.noreply.github.com>
Co-authored-by: limingshu <61349199+JamesLim-sy@users.noreply.github.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Calvinn Ng <39899397+Calvinnncy97@users.noreply.github.com>
Co-authored-by: team <calvinn.ng@ahrefs.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Benjamin Kitor <bkitor@gmail.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Bla_ckB <50193121+BlackBird-Coding@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.