Support Qwen3 #17

jlonge4 · 2025-04-30T23:25:42Z

Issue #, if available:
N/A
Description of changes:
Support the new qwen3 model, this code was tested (logit-val) using Qwen/Qwen3-8B

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jlonge4 · 2025-05-14T14:21:26Z

Logit Validation Benchmark Code:

!inference_demo \
    --model-type qwen3 \
    --task-type causal-lm \
    run \
    --model-path /home/ubuntu/model_hf_qwen/qwen/ \
    --compiled-model-path /home/ubuntu/traced_model_qwen/qwen/logit \
    --torch-dtype bfloat16 \
    --tp-degree 8 \
    --batch-size 1 \
    --max-context-length 16 \
    --seq-len 32 \
    --enable-bucketing \
    --pad-token-id 151645 \
    --prompt "To be, or not to be" \
    --check-accuracy-mode logit-matching \
    --benchmark

Results:

Expected Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Expected Logits Shape:  torch.Size([25, 1, 151936])
Actual Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Actual Logits Shape:  torch.Size([25, 1, 151936])
Passed logits validation!

Generating outputs...
Prompts: ['To be, or not to be']
Generated outputs:
Output 0: To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune

Benchmark completed and its result is as following
{
    "e2e_model": {
        "latency_ms_p50": 156.56781196594238,
        "latency_ms_p90": 158.08086395263672,
        "latency_ms_p95": 158.1140637397766,
        "latency_ms_p99": 158.28602075576782,
        "latency_ms_p100": 158.32901000976562,
        "latency_ms_avg": 156.99772834777832,
        "throughput": 203.82460521412273
    },
    "context_encoding_model": {
        "latency_ms_p50": 10.202646255493164,
        "latency_ms_p90": 10.224390029907227,
        "latency_ms_p95": 10.22493839263916,
        "latency_ms_p99": 10.226750373840332,
        "latency_ms_p100": 10.227203369140625,
        "latency_ms_avg": 10.201811790466309,
        "throughput": 1568.348870634151
    },
    "token_generation_model": {
        "latency_ms_p50": 8.858323097229004,
        "latency_ms_p90": 8.903312683105469,
        "latency_ms_p95": 9.238588809967041,
        "latency_ms_p99": 9.264287948608398,
        "latency_ms_p100": 9.28950309753418,
        "latency_ms_avg": 8.88296922047933,
        "throughput": 120.07996877975322
    }
}

aws-yyjau · 2025-05-21T23:24:36Z

Thanks for the PR. We will look into it.

aws-yyjau · 2025-05-29T18:03:51Z

Hi @jlonge4
Thanks for your contribution. We are working on the model support with the reference of your change.
The support will be included in the future release.
We'll close the the PR once the support in the release.

jlonge4 · 2025-05-29T18:55:06Z

Great news, thanks very much @aws-yyjau

qwen3

94ee9e1

jlonge4 force-pushed the jl-qwen3 branch from 31273d1 to 94ee9e1 Compare May 1, 2025 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Qwen3 #17

Support Qwen3 #17

Uh oh!

jlonge4 commented Apr 30, 2025

Uh oh!

jlonge4 commented May 14, 2025

Uh oh!

aws-yyjau commented May 21, 2025

Uh oh!

aws-yyjau commented May 29, 2025

Uh oh!

jlonge4 commented May 29, 2025

Uh oh!

Uh oh!

Support Qwen3 #17

Are you sure you want to change the base?

Support Qwen3 #17

Uh oh!

Conversation

jlonge4 commented Apr 30, 2025

Uh oh!

jlonge4 commented May 14, 2025

Uh oh!

aws-yyjau commented May 21, 2025

Uh oh!

aws-yyjau commented May 29, 2025

Uh oh!

jlonge4 commented May 29, 2025

Uh oh!

Uh oh!