Commits

Commits on Aug 28, 2024

Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test (vllm-project#7961 )
mgoin
authored

Commits on Aug 22, 2024

[Bugfix] Use LoadFormat values for vllm serve --load-format (vllm-project#7784 )
mgoin
authored
Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (vllm-project#7527 )" (vllm-project#7764 )
mgoin
authored

Commits on Aug 15, 2024

[Bugfix] Fix default weight loading for scalars (vllm-project#7534 )
mgoin
authored

Commits on Aug 8, 2024

[Doc] Put collect_env issue output in a <detail> block (vllm-project#7310 )
mgoin
authored

Commits on Aug 2, 2024

[CI/Build] Add support for Python 3.12 (vllm-project#7035 )
mgoin
authored

Commits on Aug 1, 2024

Commits on Jul 31, 2024

Commits on Jul 27, 2024

Add Nemotron to PP_SUPPORTED_MODELS (vllm-project#6863 )
mgoin
authored

Commits on Jul 26, 2024

Commits on Jul 25, 2024

[Bugfix] Fix kv_cache_dtype=fp8 without scales for FP8 checkpoints (vllm-project#6761 )
mgoin
authored

Commits on Jul 24, 2024

[Bugfix] Bump transformers to 4.43.2 (vllm-project#6752 )
mgoin
authored

Commits on Jul 20, 2024

[Misc] Fix input_scale typing in w8a8_utils.py (vllm-project#6579 )
mgoin
authored

Commits on Jul 18, 2024

[Model] Support Mistral-Nemo (vllm-project#6548 )
mgoin
authored

Commits on Jul 16, 2024

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (vllm-project#6081 )
mgoin
authored

Commits on Jul 12, 2024

Commits on Jul 3, 2024

[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (vllm-project#5975 )
mgoin
authored

Commits on Jun 28, 2024

[Bugfix] Only add Attention.kv_scale if kv cache quantization is enabled (vllm-project#5936 )
mgoin
authored

Commits on Jun 25, 2024

[CI/Build] Add unit testing for FlexibleArgumentParser (vllm-project#5798 )
mgoin
authored