Skip to content

hipRTC module cache grows without bound and never hits #1142

@Noerr

Description

@Noerr

Summary

The chipStar module cache (~/.cache/chipStar/) is write-only for hipRTC-compiled kernels. Each run writes new cache entries that are never loaded on subsequent runs, causing unbounded disk growth with no cross-run speedup.

Statically compiled HIP programs (typical hipcc approach) are not affected -- their cache works reasonably.

Root Cause

hipRTC's SPIR-V output is non-deterministic across runs for two independent reasons:

1. Random temp directory paths embedded in SPIR-V

hiprtcCompileProgram() creates a random temporary directory and its absolute path leaks into the compiled SPIR-V through four places in spirv_hiprtc.cc:

// createCompileCommand():
Append("-I" + WorkingDirectory.string());   // -I flag with random path
Append(SourceFile.string());                // absolute source path
Append(OutputFile.string());                // absolute output path

// createSourceFile() — lowered names magic variable:
File << ... << LoweredNamesFile << ";";     // absolute path as string literal

Clang embeds these paths in SPIR-V metadata (DICompileUnit, module ID). The _chip_name_expr_output_file string literal (used by the HipEmitLoweredNames pass) is compiled directly into the SPIR-V. When users pass -g, clang also embeds the working directory as DW_AT_comp_dir.

2. LLVM non-deterministic code generation

Even after fixing all path-related issues, LLVM produces non-deterministic SPIR-V for some kernels. Observed differences include basic block names with different numeric suffixes (loadstoreloop264 vs loadstoreloop252) and different SPIR-V instruction IDs for the same logical blocks. This is a known class of LLVM issue caused by hash table iteration order depending on pointer addresses:

In testing with 5 real-world kernels, 2 produced deterministic SPIR-V after path fixes; 3 remained non-deterministic (36–1327 differing bytes in ~125K object files).

Observed Behavior

$ rm -rf ~/.cache/chipStar/*
$ CHIP_LOGLEVEL=info ./my_hiprtc_program    # run 1
# Log shows "Kernel compilation took X seconds" (no "Loaded from cache")
$ ls ~/.cache/chipStar/ | wc -l
3
$ CHIP_LOGLEVEL=info ./my_hiprtc_program    # run 2
# Still no "Loaded from cache"
$ ls ~/.cache/chipStar/ | wc -l
6    # 3 new files, different names

Cache grows by N files per run (one per kernel module). Entries are never read. Timings are identical whether the cache directory is populated or empty.

Proposed Fix

Path fixes (necessary for any future cache solution)

  1. cd into the working directory in executeCommand()
  2. Remove -I<WorkingDirectory> — clang searches the source file's directory by default for quoted includes
  3. Use SourceFile.filename() and OutputFile.filename() instead of absolute paths
  4. Use LoweredNamesFile.filename() in createSourceFile() for the _chip_name_expr_output_file magic variable
  5. Add -fdebug-compilation-dir=. to prevent cwd embedding when -g is used

Disable cache for dynamically loaded modules

Since LLVM non-determinism cannot be fixed at the chipStar level, skip cache load() and save() for modules loaded via hipModuleLoadData():

  • Add IsDynamicLoad flag to SPVModule (set in hipModuleLoadDataInternal)
  • Check the flag in CHIPModuleOpenCL::compile() to skip cache operations

This prevents unbounded growth while leaving the cache fully functional for statically compiled HIP programs.

Workaround

export CHIP_MODULE_CACHE_DIR=""

Environment

  • macOS on Apple Silicon (ARM64), POCL as OpenCL backend
  • chipStar main branch (post-LLVM 21 merge)
  • POCL main branch with LLVM 21
  • CHIP_BE=opencl, CHIP_DEVICE_TYPE=pocl
  • The LLVM non-determinism is not platform-specific and likely affects other backends

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions