-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Summary
The chipStar module cache (~/.cache/chipStar/) is write-only for hipRTC-compiled kernels. Each run writes new cache entries that are never loaded on subsequent runs, causing unbounded disk growth with no cross-run speedup.
Statically compiled HIP programs (typical hipcc approach) are not affected -- their cache works reasonably.
Root Cause
hipRTC's SPIR-V output is non-deterministic across runs for two independent reasons:
1. Random temp directory paths embedded in SPIR-V
hiprtcCompileProgram() creates a random temporary directory and its absolute path leaks into the compiled SPIR-V through four places in spirv_hiprtc.cc:
// createCompileCommand():
Append("-I" + WorkingDirectory.string()); // -I flag with random path
Append(SourceFile.string()); // absolute source path
Append(OutputFile.string()); // absolute output path
// createSourceFile() — lowered names magic variable:
File << ... << LoweredNamesFile << ";"; // absolute path as string literalClang embeds these paths in SPIR-V metadata (DICompileUnit, module ID). The _chip_name_expr_output_file string literal (used by the HipEmitLoweredNames pass) is compiled directly into the SPIR-V. When users pass -g, clang also embeds the working directory as DW_AT_comp_dir.
2. LLVM non-deterministic code generation
Even after fixing all path-related issues, LLVM produces non-deterministic SPIR-V for some kernels. Observed differences include basic block names with different numeric suffixes (loadstoreloop264 vs loadstoreloop252) and different SPIR-V instruction IDs for the same logical blocks. This is a known class of LLVM issue caused by hash table iteration order depending on pointer addresses:
- [SPIRV] Non-deterministic compiler output for debug-type-pointer.ll llvm/llvm-project#123791 (non-deterministic SPIR-V output, open)
- https://github.com/mgrang/non-determinism (42 iteration-order bugs found in LLVM)
In testing with 5 real-world kernels, 2 produced deterministic SPIR-V after path fixes; 3 remained non-deterministic (36–1327 differing bytes in ~125K object files).
Observed Behavior
$ rm -rf ~/.cache/chipStar/*
$ CHIP_LOGLEVEL=info ./my_hiprtc_program # run 1
# Log shows "Kernel compilation took X seconds" (no "Loaded from cache")
$ ls ~/.cache/chipStar/ | wc -l
3
$ CHIP_LOGLEVEL=info ./my_hiprtc_program # run 2
# Still no "Loaded from cache"
$ ls ~/.cache/chipStar/ | wc -l
6 # 3 new files, different names
Cache grows by N files per run (one per kernel module). Entries are never read. Timings are identical whether the cache directory is populated or empty.
Proposed Fix
Path fixes (necessary for any future cache solution)
cdinto the working directory inexecuteCommand()- Remove
-I<WorkingDirectory>— clang searches the source file's directory by default for quoted includes - Use
SourceFile.filename()andOutputFile.filename()instead of absolute paths - Use
LoweredNamesFile.filename()increateSourceFile()for the_chip_name_expr_output_filemagic variable - Add
-fdebug-compilation-dir=.to prevent cwd embedding when-gis used
Disable cache for dynamically loaded modules
Since LLVM non-determinism cannot be fixed at the chipStar level, skip cache load() and save() for modules loaded via hipModuleLoadData():
- Add
IsDynamicLoadflag toSPVModule(set inhipModuleLoadDataInternal) - Check the flag in
CHIPModuleOpenCL::compile()to skip cache operations
This prevents unbounded growth while leaving the cache fully functional for statically compiled HIP programs.
Workaround
export CHIP_MODULE_CACHE_DIR=""Environment
- macOS on Apple Silicon (ARM64), POCL as OpenCL backend
- chipStar main branch (post-LLVM 21 merge)
- POCL main branch with LLVM 21
CHIP_BE=opencl,CHIP_DEVICE_TYPE=pocl- The LLVM non-determinism is not platform-specific and likely affects other backends