[ToDo]: V0.1.5 Iteration Plan

# V0.1.5 Iteration Plan

## New Model Support

- [x] [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
    - Supported in #54 

## Feature Support

- [x] Remove the `pycuda` dependency; #20 
    - Supported in #30 
- [x] Change the `flash_attn` dependency to optional; #23  @liyucheng09 
- [x] Add unittest. @liyucheng09 
    - Supported in #31 
- [x] Support multi-gpu #25;
    - Supported in #30 
- [x] Add end-to-end benchmark script using vLLM #18;
    - Supported in #49 

## Bugfix

- [x] Fix the apply_rotary_pos_emb_single function. #25. 
    - Fixed in #30 
- [x] Fix the import warning; #28
    - Fixed in #30 
- [x] Fix the vLLM >= 0.4.1; #42 
    - Fixed in #44
 - [x] Fix the `is_flash_attn_2_available` issue; 
    - Fixed in #54 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ToDo]: V0.1.5 Iteration Plan #27

V0.1.5 Iteration Plan

New Model Support

Feature Support

Bugfix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ToDo]: V0.1.5 Iteration Plan #27

Description

V0.1.5 Iteration Plan

New Model Support

Feature Support

Bugfix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions