Pre-built TVM libraries for Windows with LLVM codegen enabled - the missing piece for MLC-LLM model conversion on Windows.
This repository provides pre-compiled TVM binaries for Windows x64 with LLVM support enabled.
The Problem:
- MLC-LLM's official pre-built packages for Windows do not include LLVM support
- Without LLVM, you cannot convert models (the
mlc_llm convert_weightcommand fails) - Building TVM with LLVM on Windows is extremely difficult and time-consuming (2-4 hours)
- Most developers give up and switch to Linux/WSL
The Solution:
- Use these pre-built binaries to save hours of painful Windows C++ build hell
- Drop-in replacement for the official TVM installation
- Enables full MLC-LLM model conversion workflow on native Windows
This package contains 4 DLL files built on Windows with LLVM 18.1.8:
| File | Size | Description |
|---|---|---|
tvm.dll |
109 MB | Full TVM with LLVM codegen - Required for model conversion |
tvm_runtime.dll |
2.5 MB | Lightweight runtime (no LLVM) - For inference only |
tvm_ffi.dll |
1.4 MB | Foreign Function Interface - Python/C++ interop layer |
tvm_ffi_testing.dll |
582 KB | Testing utilities (optional) |
Build Configuration:
- β LLVM Support: ENABLED (v18.1.8)
- β MSVC Compiler: Visual Studio 2022
- β Build Type: Release (optimized)
- β Target: Windows x64 only
- β CUDA: Disabled (CPU-only for model conversion)
- β OpenCL/Vulkan: Disabled (not needed for conversion)
- Windows 10/11 (64-bit)
- Visual C++ Redistributable 2015-2022 (x64)
- Download: https://aka.ms/vs/17/release/vc_redist.x64.exe
- Required for MSVC-built DLLs
- Python 3.11+ with virtual environment
- 16GB+ RAM (for model conversion)
# Create virtual environment
python -m venv mlc-venv
.\mlc-venv\Scripts\Activate.ps1
# Install PyTorch (CPU version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install transformers and dependencies
pip install transformers accelerate# Clone with submodules
git clone --recursive https://github.com/mlc-ai/mlc-llm.git
cd mlc-llm
# Update submodules
git submodule update --init --recursive# Download this repository
git clone https://github.com/NVitlam/tvm-windows-llvm-binaries.git
# Copy DLLs to TVM Python package directory
# Location: mlc-llm\3rdparty\tvm\python\tvm\
Copy-Item tvm-windows-llvm-binaries\bin\*.dll mlc-llm\3rdparty\tvm\python\tvm\ -Force
# Verify files are copied
ls mlc-llm\3rdparty\tvm\python\tvm\tvm.dll
# Should show tvm.dll (~109 MB)cd mlc-llm\3rdparty\tvm\python
# Install TVM in development mode
pip install -e .
# Verify installation
python -c "import tvm; print('TVM version:', tvm.__version__)"cd ..\..\..\ # Back to mlc-llm root
# Install MLC-LLM
pip install -e . --no-build-isolation
# Verify installation
python -c "import mlc_llm; print('MLC-LLM imported successfully')"# Download a test model (example: Qwen 2.5 3B)
mkdir models
cd models
git lfs clone https://huggingface.co/Qwen/Qwen2.5-3B-Instruct
cd ..
# Convert model weights to MLC format
python -m mlc_llm convert_weight \
.\models\Qwen2.5-3B-Instruct \
--quantization q4f16_1 \
-o .\dist\Qwen2.5-3B-Instruct-q4f16_1-MLC
# Generate MLC chat config
python -m mlc_llm gen_config \
.\models\Qwen2.5-3B-Instruct \
--quantization q4f16_1 \
--conv-template qwen2_5 \
-o .\dist\Qwen2.5-3B-Instruct-q4f16_1-MLCExpected Results:
- β Conversion completes without errors
- β Model size reduced from ~6GB to ~2GB (q4f16_1 quantization)
- β No "Cannot find global function tvm.codegen.llvm.GetDefaultTargetTriple" error
β
Convert models to MLC format (mlc_llm convert_weight)
β
Generate MLC chat configs (mlc_llm gen_config)
β
Quantize models (q4f16_1, q4f16_0, q3f16_1, etc.)
β
Run TVM Python package on Windows
β
Compile models for CPU inference
β
LLVM IR code generation
β GPU acceleration (CUDA/OpenCL/Vulkan disabled) β Android/ARM compilation (this is Windows x64 only) β Metal acceleration (macOS only)
Note: For Android deployment, you need to build TVM separately for ARM64 with GPU support.
If you're building an Android app with MLC-LLM:
-
On Windows (using these binaries):
- Convert model weights:
mlc_llm convert_weight - Generate config:
mlc_llm gen_config - Output: Quantized model files (~2GB for 3B model)
- Convert model weights:
-
On Android device:
- Use a separate TVM build for Android ARM64 with GPU support
- Load the converted model files
- Run inference with Vulkan/OpenCL acceleration
This workflow allows you to do heavy model conversion on Windows, then deploy to mobile.
# Test conversion with a small model first
python -m mlc_llm convert_weight \
.\models\your-model \
--quantization q4f16_1 \
-o .\dist\your-model-q4f16_1-MLC
# Check output
ls .\dist\your-model-q4f16_1-MLC
# Should see: params_shard_*.bin, mlc-chat-config.json, tokenizer filesTry different quantization levels:
# 4-bit with fp16 activations (balanced)
python -m mlc_llm convert_weight model --quantization q4f16_1 -o dist/model-q4f16_1
# 3-bit (smaller, faster, lower quality)
python -m mlc_llm convert_weight model --quantization q3f16_1 -o dist/model-q3f16_1
# 4-bit variant (faster)
python -m mlc_llm convert_weight model --quantization q4f16_0 -o dist/model-q4f16_0Solution: Install Visual C++ Redistributable
# Download and install from Microsoft
https://aka.ms/vs/17/release/vc_redist.x64.exeCause: You're still using the official TVM without LLVM support.
Solution: Make sure you copied the DLLs correctly:
# Check tvm.dll size - should be ~109MB (with LLVM)
ls mlc-llm\3rdparty\tvm\python\tvm\tvm.dll
# If it's smaller (~2MB), you have the runtime-only version
# Re-copy from this repositorySolution:
- Close other applications
- Model conversion needs 16GB+ RAM
- Try a smaller model first
- Reduce system memory usage
Solution:
# Reinstall TVM package
cd mlc-llm\3rdparty\tvm\python
pip install -e . --force-reinstall --no-depsExpected Times:
- Small models (1-3B): 10-20 minutes
- Medium models (7-13B): 30-60 minutes
- Large models (30B+): 1-3 hours
CPU-only builds are slower than GPU builds - this is normal.
These binaries were compiled from source with the following process:
- OS: Windows 11 x64
- Compiler: Microsoft Visual Studio 2022 (MSVC v143)
- CMake: 4.1.2
- LLVM: 18.1.8 (pre-built SDK from GitHub releases)
- vcpkg: Used for libxml2 dependency
Click to expand full build instructions
# Install Visual Studio 2022 Build Tools
# Download from: https://visualstudio.microsoft.com/downloads/
# Install LLVM 18.1.8
# Download from: https://github.com/llvm/llvm-project/releases/tag/llvmorg-18.1.8
# Extract to C:\LLVM-Dev\
# Install vcpkg
git clone https://github.com/Microsoft/vcpkg.git C:\vcpkg
cd C:\vcpkg
.\bootstrap-vcpkg.bat
# Install libxml2 (MSVC-compiled)
.\vcpkg install libxml2:x64-windows-staticgit clone --recursive https://github.com/mlc-ai/mlc-llm.git
cd mlc-llm\3rdparty\tvm
# Fix CMake version compatibility issues
# Edit: 3rdparty\tokenizers-cpp\msgpack\CMakeLists.txt
# Change line 1: CMAKE_MINIMUM_REQUIRED (VERSION 3.5 FATAL_ERROR)
# Edit: 3rdparty\tokenizers-cpp\sentencepiece\CMakeLists.txt
# Change line 15: cmake_minimum_required(VERSION 3.5 FATAL_ERROR)mkdir build-windows
cd build-windows
cmake .. `
-G "Visual Studio 17 2022" `
-A x64 `
-DCMAKE_BUILD_TYPE=Release `
-DUSE_LLVM=ON `
-DLLVM_DIR="C:/LLVM-Dev/lib/cmake/llvm" `
-DUSE_CUDA=OFF `
-DUSE_OPENCL=OFF `
-DUSE_VULKAN=OFF `
-DUSE_METAL=OFFcmake --build . --config Release --parallel 8build-windows\
βββ Release\
β βββ tvm.dll (109 MB)
β βββ tvm_runtime.dll (2.5 MB)
βββ lib\
βββ tvm_ffi.dll (1.4 MB)
βββ tvm_ffi_testing.dll (582 KB)
Build Time: ~2-3 hours (first time) Difficulty: High (Windows C++ builds are complex) Pain Level: π₯π₯π₯π₯π₯ (saved you this nightmare!)
- MLC-LLM Docs: https://llm.mlc.ai/docs/
- TVM Docs: https://tvm.apache.org/docs/
- MLC-LLM GitHub: https://github.com/mlc-ai/mlc-llm
- TVM GitHub: https://github.com/apache/tvm
- MLC-LLM Discord: https://discord.gg/9Xpy2HGBuD
- TVM Discuss: https://discuss.tvm.apache.org/
- llama.cpp: Alternative for GGUF model inference
- GGUF-to-MLC Converter: (if available)
- HuggingFace Model Hub: https://huggingface.co/models
Found a bug or have improvements? Please open an issue or PR!
Please include:
- Windows version (run
winver) - Python version (
python --version) - Error message (full traceback)
- Steps to reproduce
Want to build a newer TVM version yourself?
- Check the build instructions in
docs/BUILD_PROCESS.md - Update LLVM version if needed
- Follow the CMake configuration steps
- Submit a PR with updated binaries
Apache License 2.0 (same as TVM and MLC-LLM)
These binaries are compiled from official TVM source code with no modifications except build configuration.
- TVM: https://github.com/apache/tvm (Apache 2.0)
- MLC-LLM: https://github.com/mlc-ai/mlc-llm (Apache 2.0)
- LLVM: https://llvm.org/ (Apache 2.0 with LLVM Exceptions)
These binaries are provided as-is without warranty.
- Built on Windows 11 x64 with Visual Studio 2022
- Tested with Python 3.11 and MLC-LLM commit 7b15b196
- May not work with all configurations
- Always test with your specific setup
For production use, consider building from source to match your exact environment.
- Apache TVM Team: For the amazing tensor compiler
- MLC-LLM Team: For making LLM deployment accessible
- LLVM Project: For the compiler infrastructure
- Open Source Community: For helping debug Windows build issues
- Total Build Time Invested: ~8 hours (including troubleshooting)
- Time Saved Per User: ~2-4 hours
- Community Benefit: Hopefully hundreds of developers!
- π₯ Download Latest Release
- π Full Build Documentation
- π Report Issues
- π¬ Discussions
Made with β€οΈ and lots of β to save you from Windows build hell.
Star β this repo if it saved you time!