Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Upstream sync 2024 04 26 #211

Merged
merged 107 commits into from
May 2, 2024
Merged

Upstream sync 2024 04 26 #211

merged 107 commits into from
May 2, 2024

Conversation

robertgshaw2-neuralmagic
Copy link
Collaborator

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic commented Apr 26, 2024

Upstream sync 2024 04 26 (#211)

SUMMARY:
Merge commits from vllm-project@a37d815 to vllm-project@b6dcb4d

Note that vllm-project@a37d815 is NOT included in this merge.

rkooo567 and others added 30 commits April 26, 2024 21:04
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
youkaichao and others added 24 commits April 26, 2024 21:09
[Core][Distributed] use existing torch.cuda.device context manager (vllm-project#4318)
This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187.
The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.
Co-authored-by: Simon Mo <simon.mo@hey.com>
…formers 4.40.0 (vllm-project#4324)

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Caio Mendes <caiocesart@microsoft.com>
Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool.

@andy-neuma andy-neuma merged commit 8f55a0c into main May 2, 2024
12 checks passed
@andy-neuma andy-neuma deleted the upstream-sync-2024-04-26 branch May 2, 2024 16:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.