Skip to content

Conversation

@liyucheng09
Copy link
Contributor

@liyucheng09 liyucheng09 commented Jul 13, 2024

What does this PR do?

Feature

  • Add triton-based decoding for HF mode, in case flash_attn is not available.
  • The vLLM mode stay the same as it wouldn't require flash_attn during decoding.

Bug Fixed

UnitTest

  • Passed in Local

Who can review?

@iofu728

@iofu728 iofu728 changed the title add triton-based decoding in case flash_attn is not available Feature(MInference): add triton-based decoding in case flash_attn is not available Jul 15, 2024
@iofu728 iofu728 merged commit 50d17d9 into main Jul 15, 2024
@iofu728 iofu728 deleted the decoding-dev branch July 15, 2024 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Question]: Is A6000 supported?

2 participants