|  | 
|  | 1 | +# Understanding Targets | 
|  | 2 | + | 
|  | 3 | +TileLang is built on top of TVM, which relies on **targets** to describe the device you want to compile for. | 
|  | 4 | +The target determines which code generator is used (CUDA, HIP, Metal, LLVM, …) and allows you to pass | 
|  | 5 | +device-specific options such as GPU architecture flags. This page summarises how to pick and customise a target | 
|  | 6 | +when compiling TileLang programs. | 
|  | 7 | + | 
|  | 8 | +## Common target strings | 
|  | 9 | + | 
|  | 10 | +TileLang ships with a small set of common targets; each accepts the full range of TVM options so you can fine-tune | 
|  | 11 | +the generated code. The most frequent choices are listed below: | 
|  | 12 | + | 
|  | 13 | +| Base name | Description | | 
|  | 14 | +| --------- | ----------- | | 
|  | 15 | +| `auto` | Detects CUDA → HIP → Metal in that order. Useful when running the same script across machines. | | 
|  | 16 | +| `cuda` | NVIDIA GPUs. Supports options such as `-arch=sm_80`, `-max_num_threads=1024`, etc. | | 
|  | 17 | +| `hip` | AMD GPUs via ROCm. Options like `-mcpu=gfx90a` can be appended. | | 
|  | 18 | +| `metal` | Apple Silicon GPUs (arm64 Macs). | | 
|  | 19 | +| `llvm` | CPU execution; accepts the standard TVM LLVM switches. | | 
|  | 20 | +| `webgpu` | Browser / WebGPU runtimes. | | 
|  | 21 | +| `c` | Emit plain C source for inspection or custom toolchains. | | 
|  | 22 | + | 
|  | 23 | +To add options, append them after the base name, separated by spaces. For example: | 
|  | 24 | + | 
|  | 25 | +```python | 
|  | 26 | +target = "cuda -arch=sm_90" | 
|  | 27 | +kernel = tilelang.compile(func, target=target, execution_backend="cython") | 
|  | 28 | +# or | 
|  | 29 | +@tilelang.jit(target=target) | 
|  | 30 | +def compiled_kernel(*args): | 
|  | 31 | +    return func(*args) | 
|  | 32 | +``` | 
|  | 33 | + | 
|  | 34 | +The same convention works for HIP or LLVM (e.g. `hip -mcpu=gfx940`, `llvm -mtriple=x86_64-linux-gnu`). | 
|  | 35 | + | 
|  | 36 | +### Advanced: Specify Exact Hardware | 
|  | 37 | + | 
|  | 38 | +When you already know the precise GPU model, you can encode it in the target string—either via `-arch=sm_XX` or by | 
|  | 39 | +using one of TVM’s pre-defined target tags such as `nvidia/nvidia-h100`.  Supplying this detail is optional for | 
|  | 40 | +TileLang in general use, but it becomes valuable when the TVM cost model is enabled (e.g. during autotuning).  The | 
|  | 41 | +cost model uses the extra attributes to make better scheduling predictions.  If you skip this step (or do not use the | 
|  | 42 | +cost model), generic targets like `cuda` or `auto` are perfectly fine. | 
|  | 43 | + | 
|  | 44 | +All CUDA compute capabilities recognised by TVM’s target registry are listed below.  Pick the one that matches your | 
|  | 45 | +GPU and append it to the target string or use the corresponding target tag—for example `nvidia/nvidia-a100`. | 
|  | 46 | + | 
|  | 47 | +| Architecture | GPUs (examples) | | 
|  | 48 | +| ------------ | ---------------- | | 
|  | 49 | +| `sm_20` | `nvidia/tesla-c2050`, `nvidia/tesla-c2070` | | 
|  | 50 | +| `sm_21` | `nvidia/nvs-5400m`, `nvidia/geforce-gt-520` | | 
|  | 51 | +| `sm_30` | `nvidia/quadro-k5000`, `nvidia/geforce-gtx-780m` | | 
|  | 52 | +| `sm_35` | `nvidia/tesla-k40`, `nvidia/quadro-k6000` | | 
|  | 53 | +| `sm_37` | `nvidia/tesla-k80` | | 
|  | 54 | +| `sm_50` | `nvidia/quadro-k2200`, `nvidia/geforce-gtx-950m` | | 
|  | 55 | +| `sm_52` | `nvidia/tesla-m40`, `nvidia/geforce-gtx-980` | | 
|  | 56 | +| `sm_53` | `nvidia/jetson-tx1`, `nvidia/jetson-nano` | | 
|  | 57 | +| `sm_60` | `nvidia/tesla-p100`, `nvidia/quadro-gp100` | | 
|  | 58 | +| `sm_61` | `nvidia/tesla-p4`, `nvidia/quadro-p6000`, `nvidia/geforce-gtx-1080` | | 
|  | 59 | +| `sm_62` | `nvidia/jetson-tx2` | | 
|  | 60 | +| `sm_70` | `nvidia/nvidia-v100`, `nvidia/quadro-gv100` | | 
|  | 61 | +| `sm_72` | `nvidia/jetson-agx-xavier` | | 
|  | 62 | +| `sm_75` | `nvidia/nvidia-t4`, `nvidia/quadro-rtx-8000`, `nvidia/geforce-rtx-2080` | | 
|  | 63 | +| `sm_80` | `nvidia/nvidia-a100`, `nvidia/nvidia-a30` | | 
|  | 64 | +| `sm_86` | `nvidia/nvidia-a40`, `nvidia/nvidia-a10`, `nvidia/geforce-rtx-3090` | | 
|  | 65 | +| `sm_87` | `nvidia/jetson-agx-orin-32gb`, `nvidia/jetson-agx-orin-64gb` | | 
|  | 66 | +| `sm_89` | `nvidia/geforce-rtx-4090` | | 
|  | 67 | +| `sm_90a` | `nvidia/nvidia-h100` (DPX profile) | | 
|  | 68 | +| `sm_100a` | `nvidia/nvidia-b100` | | 
|  | 69 | + | 
|  | 70 | +Refer to NVIDIA’s [CUDA GPUs](https://developer.nvidia.com/cuda-gpus) page or the TVM source | 
|  | 71 | +(`3rdparty/tvm/src/target/tag.cc`) for the latest mapping between devices and compute capabilities. | 
|  | 72 | + | 
|  | 73 | +## Creating targets programmatically | 
|  | 74 | + | 
|  | 75 | +If you prefer working with TVM’s `Target` objects, TileLang exposes the helper | 
|  | 76 | +`tilelang.utils.target.determine_target` (returns a canonical target string by default, or the `Target` | 
|  | 77 | +object when `return_object=True`): | 
|  | 78 | + | 
|  | 79 | +```python | 
|  | 80 | +from tilelang.utils.target import determine_target | 
|  | 81 | + | 
|  | 82 | +tvm_target = determine_target("cuda -arch=sm_80", return_object=True) | 
|  | 83 | +kernel = tilelang.compile(func, target=tvm_target) | 
|  | 84 | +``` | 
|  | 85 | + | 
|  | 86 | +You can also build targets directly through TVM: | 
|  | 87 | + | 
|  | 88 | +```python | 
|  | 89 | +from tvm.target import Target | 
|  | 90 | + | 
|  | 91 | +target = Target("cuda", host="llvm") | 
|  | 92 | +target = target.with_host(Target("llvm -mcpu=skylake")) | 
|  | 93 | +``` | 
|  | 94 | + | 
|  | 95 | +TileLang accepts either `str` or `Target` inputs; internally they are normalised and cached using the canonical | 
|  | 96 | +string representation.  **In user code we strongly recommend passing target strings rather than | 
|  | 97 | +`tvm.target.Target` instances—strings keep cache keys compact and deterministic across runs, whereas constructing | 
|  | 98 | +fresh `Target` objects may lead to slightly higher hashing overhead or inconsistent identity semantics.** | 
|  | 99 | + | 
|  | 100 | +## Discovering supported targets in code | 
|  | 101 | + | 
|  | 102 | +Looking for a quick reminder of the built-in base names and their descriptions? Use: | 
|  | 103 | + | 
|  | 104 | +```python | 
|  | 105 | +from tilelang.utils.target import describe_supported_targets | 
|  | 106 | + | 
|  | 107 | +for name, doc in describe_supported_targets().items(): | 
|  | 108 | +    print(f"{name:>6}: {doc}") | 
|  | 109 | +``` | 
|  | 110 | + | 
|  | 111 | +This helper mirrors the table above and is safe to call at runtime (for example when validating CLI arguments). | 
|  | 112 | + | 
|  | 113 | +## Troubleshooting tips | 
|  | 114 | + | 
|  | 115 | +- If you see `Target cuda -arch=sm_80 is not supported`, double-check the spellings and that the option is valid for | 
|  | 116 | +  TVM. Any invalid switch will surface as a target-construction error. | 
|  | 117 | +- Runtime errors such as “no kernel image is available” usually mean the `-arch` flag does not match the GPU you are | 
|  | 118 | +  running on. Try dropping the flag or switching to the correct compute capability. | 
|  | 119 | +- When targeting multiple environments, use `auto` for convenience and override with an explicit string only when | 
|  | 120 | +  you need architecture-specific tuning. | 
0 commit comments