-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugConfirmed bugsConfirmed bugs
Description
🐛 Bug
When trying to run llama-2-7b that was built with mlc_llm.build
method using 1f70d71 , it started generating this error when trying to run it (i.e. with benchmark_generate)
[2024-02-29 21:05:19] INFO auto_device.py:76: Found device: cuda:0
[2024-02-29 21:05:21] INFO auto_device.py:85: Not found device: rocm:0
[2024-02-29 21:05:22] INFO auto_device.py:85: Not found device: metal:0
[2024-02-29 21:05:24] INFO auto_device.py:85: Not found device: vulkan:0
[2024-02-29 21:05:26] INFO auto_device.py:85: Not found device: opencl:0
[2024-02-29 21:05:26] INFO auto_device.py:33: Using device: cuda:0
[2024-02-29 21:05:26] INFO chat_module.py:373: Using model folder: /data/models/mlc/1f70d71/legacy/Llama-2-7b-chat-hf-q4f16_ft/params
[2024-02-29 21:05:26] INFO chat_module.py:374: Using mlc chat config: /data/models/mlc/1f70d71/legacy/Llama-2-7b-chat-hf-q4f16_ft/params/mlc-chat-config.json
[2024-02-29 21:05:26] INFO chat_module.py:560: Using library model: /data/models/mlc/1f70d71/legacy/Llama-2-7b-chat-hf-q4f16_ft/Llama-2-7b-chat-hf-q4f16_ft-cuda.so
[2024-02-29 21:05:27] ERROR model_metadata.py:162: FAILED to read metadata section in legacy model lib.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 160, in main
metadata = _extract_metadata(parsed.model_lib)
File "/usr/local/lib/python3.10/dist-packages/mlc_chat/cli/model_metadata.py", line 26, in _extract_metadata
return json.loads(VirtualMachine(load_module(model_lib), device("cpu"))["_metadata"]())
File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/relax_vm.py", line 136, in __getitem__
return self.module[key]
File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 192, in __getitem__
return self.get_function(name)
File "/usr/local/lib/python3.10/dist-packages/tvm/runtime/module.py", line 176, in get_function
raise AttributeError(f"Module has no function '{name}'")
AttributeError: Module has no function '_metadata'
PROMPT: Once upon a time, there was a little girl who loved to read.
Traceback (most recent call last):
File "/opt/mlc-llm/benchmark.py", line 128, in <module>
print(cm.benchmark_generate(prompt=prompt, generate_length=args.max_new_tokens).strip())
File "/usr/local/lib/python3.10/dist-packages/mlc_chat/chat_module.py", line 980, in benchmark_generate
self._prefill(prompt)
File "/usr/local/lib/python3.10/dist-packages/mlc_chat/chat_module.py", line 1072, in _prefill
self._prefill_func(
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
tvm.error.InternalError: Traceback (most recent call last):
[bt] (8) /usr/local/lib/python3.10/dist-packages/mlc_chat/libmlc_llm_module.so(mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt, tvm::runtime::String)+0x52c) [0xffff23dd7a0c]
[bt] (7) /usr/local/lib/python3.10/dist-packages/mlc_chat/libmlc_llm_module.so(mlc::llm::LLMChat::ForwardTokens(std::vector<int, std::allocator<int> >, long)+0xdc8) [0xffff23dd0a68]
[bt] (6) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x313938c) [0xffff5ff8938c]
[bt] (5) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x2fc) [0xffff5ff891dc]
[bt] (4) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x313b504) [0xffff5ff8b504]
[bt] (3) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)+0x3cc) [0xffff5ff8b1ec]
[bt] (2) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x3138cc8) [0xffff5ff88cc8]
[bt] (1) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x68) [0xffff5e07d858]
[bt] (0) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::Backtrace[abi:cxx11]()+0x30) [0xffff5fefc6e0]
File "/opt/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/vm.cc", line 652
InternalError: Check failed: static_cast<size_t>(gfunc.num_args) == args.size() (330 vs. 4) : ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided.
The command used to build the model was:
python3 -m mlc_llm.build --model Llama-2-7b-chat-hf --target cuda --use-cuda-graph --use-flash-attn-mqa --quantization q4f16_ft --artifact-path /data/models/mlc/1f70d71/legacy --max-seq-len 4096
This was still working in the build that I tested last 607dc5a , so the issue seems to have been introduced within the past couple days.
Environment
- Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): ARM64+CUDA
- Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04
- Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) Jetson AGX Orin
- How you installed MLC-LLM (
conda
, source): source - How you installed TVM-Unity (
pip
, source): source - Python version (e.g. 3.10): Python 3.10
- GPU driver version (if applicable): JetPack 6.0
- CUDA/cuDNN version (if applicable): CUDA 12.2
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):2c1ce3ab467f9367c14afd9579ed1388aaae0b90
- Any other relevant information:
Metadata
Metadata
Assignees
Labels
bugConfirmed bugsConfirmed bugs