[Bug] mlc_llm.build error - `ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided.`

## 🐛 Bug

When trying to run llama-2-7b that was built with `mlc_llm.build` method using https://github.com/mlc-ai/mlc-llm/commit/1f70d7177c25162d159ad3d526bfb2c8061c5638 , it started generating this error when trying to run it (i.e. with benchmark_generate)

```bash
[2024-02-29 21:05:19] INFO auto_device.py:76: Found device: cuda:0
[2024-02-29 21:05:21] INFO auto_device.py:85: Not found device: rocm:0
[2024-02-29 21:05:22] INFO auto_device.py:85: Not found device: metal:0
[2024-02-29 21:05:24] INFO auto_device.py:85: Not found device: vulkan:0
[2024-02-29 21:05:26] INFO auto_device.py:85: Not found device: opencl:0
[2024-02-29 21:05:26] INFO auto_device.py:33: Using device: cuda:0
[2024-02-29 21:05:26] INFO chat_module.py:373: Using model folder: /data/models/mlc/1f70d71/legacy/Llama-2-7b-chat-hf-q4f16_ft/params
[2024-02-29 21:05:26] INFO chat_module.py:374: Using mlc chat config: /data/models/mlc/1f70d71/legacy/Llama-2-7b-chat-hf-q4f16_ft/params/mlc-chat-config.json
[2024-02-29 21:05:26] INFO chat_module.py:560: Using library model: /data/models/mlc/1f70d71/legacy/Llama-2-7b-chat-hf-q4f16_ft/Llama-2-7b-chat-hf-q4f16_ft-cuda.so
[2024-02-29 21:05:27] ERROR model_metadata.py:162: FAILED to read metadata section in legacy model lib.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 160, in main
    metadata = _extract_metadata(parsed.model_lib)
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 26, in _extract_metadata
    return json.loads(VirtualMachine(load_module(model_lib), device("cpu"))["_metadata"]())
  File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/relax_vm.py", line 136, in __getitem__
    return self.module[key]
  File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 192, in __getitem__
    return self.get_function(name)
  File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 176, in get_function
    raise AttributeError(f"Module has no function '{name}'")
AttributeError: Module has no function '_metadata'

PROMPT:  Once upon a time, there was a little girl who loved to read.

Traceback (most recent call last):
  File "/opt/mlc-llm/benchmark.py", line 128, in <module>
    print(cm.benchmark_generate(prompt=prompt, generate_length=args.max_new_tokens).strip())
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/chat_module.py", line 980, in benchmark_generate
    self._prefill(prompt)
  File "/usr/local/lib/python3.10/dist-packages/mlc_chat/chat_module.py", line 1072, in _prefill
    self._prefill_func(
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm.error.InternalError: Traceback (most recent call last):
  [bt] (8) /usr/local/lib/python3.10/dist-packages/mlc_chat/libmlc_llm_module.so(mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt, tvm::runtime::String)+0x52c) [0xffff23dd7a0c]
  [bt] (7) /usr/local/lib/python3.10/dist-packages/mlc_chat/libmlc_llm_module.so(mlc::llm::LLMChat::ForwardTokens(std::vector<int, std::allocator<int> >, long)+0xdc8) [0xffff23dd0a68]
  [bt] (6) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x313938c) [0xffff5ff8938c]
  [bt] (5) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x2fc) [0xffff5ff891dc]
  [bt] (4) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x313b504) [0xffff5ff8b504]
  [bt] (3) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)+0x3cc) [0xffff5ff8b1ec]
  [bt] (2) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x3138cc8) [0xffff5ff88cc8]
  [bt] (1) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x68) [0xffff5e07d858]
  [bt] (0) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::Backtrace[abi:cxx11]()+0x30) [0xffff5fefc6e0]
  File "/opt/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/vm.cc", line 652
InternalError: Check failed: static_cast<size_t>(gfunc.num_args) == args.size() (330 vs. 4) : ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided.
```

The command used to build the model was:

```bash
python3 -m mlc_llm.build --model Llama-2-7b-chat-hf --target cuda --use-cuda-graph --use-flash-attn-mqa --quantization q4f16_ft  --artifact-path /data/models/mlc/1f70d71/legacy --max-seq-len 4096
```

This was still working in the build that I tested last https://github.com/mlc-ai/mlc-llm/commit/607dc5a7486e0ca87cd7f8fa9e2e8223e1eec490 , so the issue seems to have been introduced within the past couple days.

## Environment

 - Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): ARM64+CUDA
 - Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04
 - Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) Jetson AGX Orin
 - How you installed MLC-LLM (`conda`, source): source
 - How you installed TVM-Unity (`pip`, source): source
 - Python version (e.g. 3.10): Python 3.10
 - GPU driver version (if applicable): JetPack 6.0
 - CUDA/cuDNN version (if applicable): CUDA 12.2
 - TVM Unity Hash Tag (`python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"`, applicable if you compile models): `2c1ce3ab467f9367c14afd9579ed1388aaae0b90`
 - Any other relevant information:



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] mlc_llm.build error - `ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided.` #1864

🐛 Bug

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] mlc_llm.build error - ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided. #1864

Description

🐛 Bug

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] mlc_llm.build error - `ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided.` #1864