Skip to content

Failed to build TensorRT-LLM whisper Decoder #707

Open
@muhammad-faizan-122

Description

@muhammad-faizan-122

System Info

I was following this whisper-doc to run on triton Inference Server with TensorRT-LLM backend, getting following error after running following command while building TensorRT-LLM engines for Decoder but work fine for encoder.
System specs:

OS: Ubuntu 24
CPU: x86_64

GPU specs:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.6     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               Off | 00000000:03:00.0 Off |                  Off |
| 30%   30C    P8               8W / 300W |  23516MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

building TensorRT-LLM engines for Decoder

trtllm-build  --checkpoint_dir ${checkpoint_dir}/decoder \
            --output_dir ${output_dir}/decoder \
            --moe_plugin disable \
            --max_beam_width ${MAX_BEAM_WIDTH} \
            --max_batch_size ${MAX_BATCH_SIZE} \
            --max_seq_len 114 \
            --max_input_len 14 \
            --max_encoder_input_len 3000 \
            --gemm_plugin ${INFERENCE_PRECISION} \
            --bert_attention_plugin ${INFERENCE_PRECISION} \
            --gpt_attention_plugin ${INFERENCE_PRECISION}```

Expected behavior

trtllm-build command should return TensorRT-LLM model require which will be require during inference

actual behavior

Faced following Error:

Traceback (most recent call last):
  File "/usr/local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/commands/build.py", line 627, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/commands/build.py", line 425, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
    engine = build_model(build_config,
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/commands/build.py", line 360, in build_model
    model = model_cls.from_checkpoint(ckpt_dir, config=rank_config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/modeling_utils.py", line 653, in from_checkpoint
    model.load(weights, from_pruned=is_checkpoint_pruned)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/modeling_utils.py", line 675, in load
    raise RuntimeError(
RuntimeError: Required but not provided tensors:

additional notes

I used nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3 and used this this script to convert_checkpoints.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions