Skip to content

[Bug] mlc_llm.build error - ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided. #1864

@dusty-nv

Description

@dusty-nv

🐛 Bug

When trying to run llama-2-7b that was built with mlc_llm.build method using 1f70d71 , it started generating this error when trying to run it (i.e. with benchmark_generate)

[2024-02-29 21:05:19] INFO auto_device.py:76: Found device: cuda:0
[2024-02-29 21:05:21] INFO auto_device.py:85: Not found device: rocm:0
[2024-02-29 21:05:22] INFO auto_device.py:85: Not found device: metal:0
[2024-02-29 21:05:24] INFO auto_device.py:85: Not found device: vulkan:0
[2024-02-29 21:05:26] INFO auto_device.py:85: Not found device: opencl:0
[2024-02-29 21:05:26] INFO auto_device.py:33: Using device: cuda:0
[2024-02-29 21:05:26] INFO chat_module.py:373: Using model folder: /data/models/mlc/1f70d71/legacy/Llama-2-7b-chat-hf-q4f16_ft/params
[2024-02-29 21:05:26] INFO chat_module.py:374: Using mlc chat config: /data/models/mlc/1f70d71/legacy/Llama-2-7b-chat-hf-q4f16_ft/params/mlc-chat-config.json
[2024-02-29 21:05:26] INFO chat_module.py:560: Using library model: /data/models/mlc/1f70d71/legacy/Llama-2-7b-chat-hf-q4f16_ft/Llama-2-7b-chat-hf-q4f16_ft-cuda.so
[2024-02-29 21:05:27] ERROR model_metadata.py:162: FAILED to read metadata section in legacy model lib.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 160, in main
    metadata = _extract_metadata(parsed.model_lib)
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 26, in _extract_metadata
    return json.loads(VirtualMachine(load_module(model_lib), device("cpu"))["_metadata"]())
  File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/relax_vm.py", line 136, in __getitem__
    return self.module[key]
  File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 192, in __getitem__
    return self.get_function(name)
  File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 176, in get_function
    raise AttributeError(f"Module has no function '{name}'")
AttributeError: Module has no function '_metadata'

PROMPT:  Once upon a time, there was a little girl who loved to read.

Traceback (most recent call last):
  File "/opt/mlc-llm/benchmark.py", line 128, in <module>
    print(cm.benchmark_generate(prompt=prompt, generate_length=args.max_new_tokens).strip())
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/chat_module.py", line 980, in benchmark_generate
    self._prefill(prompt)
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/chat_module.py", line 1072, in _prefill
    self._prefill_func(
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm.error.InternalError: Traceback (most recent call last):
  [bt] (8) /usr/local/lib/python3.10/dist-packages/mlc_chat/libmlc_llm_module.so(mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt, tvm::runtime::String)+0x52c) [0xffff23dd7a0c]
  [bt] (7) /usr/local/lib/python3.10/dist-packages/mlc_chat/libmlc_llm_module.so(mlc::llm::LLMChat::ForwardTokens(std::vector<int, std::allocator<int> >, long)+0xdc8) [0xffff23dd0a68]
  [bt] (6) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x313938c) [0xffff5ff8938c]
  [bt] (5) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x2fc) [0xffff5ff891dc]
  [bt] (4) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x313b504) [0xffff5ff8b504]
  [bt] (3) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)+0x3cc) [0xffff5ff8b1ec]
  [bt] (2) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x3138cc8) [0xffff5ff88cc8]
  [bt] (1) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x68) [0xffff5e07d858]
  [bt] (0) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::Backtrace[abi:cxx11]()+0x30) [0xffff5fefc6e0]
  File "/opt/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/vm.cc", line 652
InternalError: Check failed: static_cast<size_t>(gfunc.num_args) == args.size() (330 vs. 4) : ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided.

The command used to build the model was:

python3 -m mlc_llm.build --model Llama-2-7b-chat-hf --target cuda --use-cuda-graph --use-flash-attn-mqa --quantization q4f16_ft  --artifact-path /data/models/mlc/1f70d71/legacy --max-seq-len 4096

This was still working in the build that I tested last 607dc5a , so the issue seems to have been introduced within the past couple days.

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): ARM64+CUDA
  • Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) Jetson AGX Orin
  • How you installed MLC-LLM (conda, source): source
  • How you installed TVM-Unity (pip, source): source
  • Python version (e.g. 3.10): Python 3.10
  • GPU driver version (if applicable): JetPack 6.0
  • CUDA/cuDNN version (if applicable): CUDA 12.2
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): 2c1ce3ab467f9367c14afd9579ed1388aaae0b90
  • Any other relevant information:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions