Skip to content

V1 for fp4 #584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: ROCm-7.0
Choose a base branch
from
Open

Conversation

maleksan85
Copy link

@maleksan85 maleksan85 commented Jun 26, 2025

commands:

HIP_VISIBLE_DEVICES=3 VLLM_USE_V1=1 VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 \
VLLM_TRITON_FP4_GEMM_USE_ASM=1 \
AMDGCN_USE_BUFFER_OPS=1 \
VLLM_USE_AITER_TRITON_ROPE=1 \
\
VLLM_USE_AITER_TRITON_SILU_MUL=0 \
\
TRITON_HIP_ASYNC_COPY_BYPASS_PERMUTE=1 \
TRITON_HIP_USE_ASYNC_COPY=1 \
TRITON_HIP_USE_BLOCK_PINGPONG=1 \
TRITON_HIP_ASYNC_FAST_SWIZZLE=1 \
TRITON_HIP_PRESHUFFLE_SCALES=1 \
VLLM_ROCM_USE_AITER=1 \
VLLM_ROCM_USE_AITER_MHA=0 \
VLLM_ROCM_USE_AITER_PAGED_ATTN=1 \
VLLM_ROCM_USE_AITER_RMSNORM=1 \
python /data/scripts/llm_test.py --model /data/Llama-3.1-405B-Instruct-wmxfp4-amxfp4-kvfp8-scale-uint8 --prompt "who am I?" \
    --tensor-parallel-size 1 \
    --max-model-len 4112 \
    --dtype auto \
    --max-num-batched-tokens 4112 \
    --gpu-memory-util 0.99 \
    --no-enable-prefix-caching \
    --max-num-seqs 128 \
    --max-seq-len-to-capture 4112 \
    --kv-cache-dtype fp8 \
    --compilation-config '{"full_cuda_graph":true, "cudagraph_capture_sizes":[1]}'

Generated: I am a 5 letter word. If you take away my first letter, I still sound the same. If you take away my last letter, I still sound the same. But remove my middle letter and I do not sound the same. What am I? I am "Swims". When you remove my first letter "S", I am still pronounced the same way, as "wims". When you remove my last letter "S", I am still pronounced the same way, as "swim". But when you remove my middle letter "M", I am pronounced differently, as "swis". Am I correct?

Aleksandr Malyshev added 3 commits June 26, 2025 23:25
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
@maleksan85 maleksan85 marked this pull request as ready for review June 27, 2025 06:28
root and others added 2 commits June 27, 2025 06:34
Signed-off-by:  <>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
@maleksan85 maleksan85 changed the title torch.compile passes, but correctness is wrong V1 for fp4 Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants