Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
09edd30
[Doc] Add user-facing fastcache documentation
hughperkins Apr 30, 2026
c9aab5a
[Doc] Rewrite fastcache intro to focus on process-start cost
hughperkins Apr 30, 2026
ff39d7f
[Doc] Clarify "next run" to "next process run"
hughperkins Apr 30, 2026
484046b
[Doc] Remove redundant hash mention from fastcache intro
hughperkins Apr 30, 2026
fca294f
[Doc] Trim fastcache intro sentence
hughperkins Apr 30, 2026
10461bc
[Doc] Trim redundant phrase from fastcache intro
hughperkins Apr 30, 2026
0c48cbd
[Doc] Remove implementation detail from fastcache intro
hughperkins Apr 30, 2026
31eac7b
[Doc] Simplify fastcache benchmark table to match introduced concepts
hughperkins Apr 30, 2026
334d444
[Doc] Simplify benchmark table row labels
hughperkins Apr 30, 2026
2f1fcca
[Doc] Fix fastcache benchmark number to 0.3s
hughperkins Apr 30, 2026
ef1bc41
[Doc] Remove redundant "Why you want it" section
hughperkins Apr 30, 2026
7457584
[Doc] Fix baseline to 4.6s (includes PTX cache)
hughperkins Apr 30, 2026
2e69b45
[Doc] Trim fastcache intro
hughperkins Apr 30, 2026
3a2e52d
[Doc] Minor grammar fix in fastcache intro
hughperkins Apr 30, 2026
f128429
[Doc] Remove redundant sentence before code example
hughperkins Apr 30, 2026
aa40b4b
Revert "[Doc] Remove redundant sentence before code example"
hughperkins Apr 30, 2026
1732c7b
[Doc] Explain why fastcache is opt-in before the code example
hughperkins Apr 30, 2026
ef7f6fa
[Doc] Clarify fastcache applies to new process starts
hughperkins Apr 30, 2026
3096be2
[Doc] Remove deprecated alternatives section
hughperkins Apr 30, 2026
2d71816
[Doc] Move Diagnostics into Advanced section
hughperkins Apr 30, 2026
5dc2c48
[Doc] Remove redundant "Fields not supported" section
hughperkins Apr 30, 2026
05866ac
[Doc] Move cache invalidation to its own top-level section
hughperkins Apr 30, 2026
833b8d7
[Doc] s/readable/available/ in constraint heading
hughperkins Apr 30, 2026
7aecb4e
[Doc] Clarify data_oriented cache key includes primitive values
hughperkins Apr 30, 2026
007d407
[Doc] Add section explaining add_value_to_cache_key for dataclass fields
hughperkins Apr 30, 2026
212b142
[Doc] Clarify why type-only cache keys suffice for ndarrays
hughperkins Apr 30, 2026
b2e4e89
[Doc] Add qd.static example for add_value_to_cache_key motivation
hughperkins Apr 30, 2026
8380891
[Doc] Move qd.static example into its own code block
hughperkins Apr 30, 2026
dc1a679
[Doc] Add loop unrolling clarification
hughperkins Apr 30, 2026
a6d4819
[Doc] Use concrete name instead of "that value"
hughperkins Apr 30, 2026
35adbf4
[Doc] s/these/such/
hughperkins Apr 30, 2026
cef9f1b
[Doc] Fix incorrect claim about recompilation
hughperkins Apr 30, 2026
a37917e
[Doc] Clearer wording for missing cache key consequence
hughperkins Apr 30, 2026
0acfa1f
[Doc] Add paragraph break before "Mark such fields"
hughperkins Apr 30, 2026
d0f06b2
[Doc] Add cross-reference to dataclass cached values section
hughperkins Apr 30, 2026
309e9c9
[Doc] Add bool to non-template primitives row in fastcache table
Apr 30, 2026
c3621f0
[Doc] Rewrite cache invalidation as cache keying; clarify versions co…
Apr 30, 2026
37c0400
[Doc] Add purity exemptions table (enums, math/numpy, qd attrs)
Apr 30, 2026
c64ea07
[Doc] Fix enum exemption rationale
Apr 30, 2026
2aed791
[Doc] Soften enum rationale wording
Apr 30, 2026
875795e
[Doc] Soften math/numpy rationale wording
Apr 30, 2026
f8ec106
[Doc] Trim math/numpy rationale
Apr 30, 2026
d411723
[Doc] Remove 'Immutable' from enum rationale
Apr 30, 2026
9770e6a
[Doc] Soften quadrants attrs rationale wording
Apr 30, 2026
6f6b2dd
[Doc] Use Tile16x16.SIZE as quadrants attr example
Apr 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions docs/source/user_guide/fastcache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Fastcache

## What it is

Fastcache reduces the time it takes to load cached kernels when a new Python process starts.

The standard [offline cache](init_options.md#offline_cache) already persists compiled kernels to disk. However, loading a cached kernel still requires parsing the kernel's Python AST and transforming it into IR. For applications with many kernels this front-end overhead alone can take several seconds.

Fastcache bypasses that front-end work. It computes a cheap cache key from the kernel source text, argument types, and compiler config, and uses it to load the compiled artifact directly.

On a Genesis simulator benchmark (`single_franka_envs.py`, Ubuntu 24.04, NVIDIA 5090):

| Configuration | Process-start kernel load time |
|---|---|
| Other caches (no fastcache) | 4.6 s |
| + fastcache | 0.3 s |

## How to use it

### Enabling fastcache on a kernel

Fastcache requires the kernel to be *pure*: all data it operates on must be passed as explicit parameters, with nothing captured from the enclosing Python scope (see [Constraints](#constraints) below). Because not all kernels satisfy this, fastcache is opt-in — you assert purity by adding `fastcache=True` to the `@qd.kernel` decorator:

```python
import quadrants as qd

@qd.kernel(fastcache=True)
def my_kernel(a: qd.types.NDArray[qd.f32, 1], b: qd.types.NDArray[qd.f32, 1]) -> None:
for i in range(a.shape[0]):
b[i] = a[i] * 2.0
```

That's it. On the first call, the kernel compiles normally and the fastcache entry is written to disk. When the next Python process starts, the cached artifact is loaded directly.

### Runtime configuration

Fastcache requires the offline cache to be enabled (which it is by default). Two `qd.init` options are relevant:

| Option | Default | Effect |
|---|---|---|
| `src_ll_cache` | `True` | Master switch for fastcache. Set to `False` to disable it globally. |
| `print_non_pure` | `False` | When `True`, prints the name of every kernel at call time that is *not* marked `fastcache=True`. Useful for finding kernels you forgot to annotate. |

```python
qd.init(arch=qd.gpu)
# Fastcache is on by default. To disable:
# qd.init(arch=qd.gpu, src_ll_cache=False)

# To find un-annotated kernels:
# qd.init(arch=qd.gpu, print_non_pure=True)
```

## Dataclass fields with cached values

By default, for `dataclasses.dataclass` parameters, fastcache only includes the *types* of each field in the cache key, not their values. This is fine for fields like ndarrays, where the compiled kernel doesn't depend on the actual data, only the dtype and dimensionality.

However, some dataclass fields hold configuration values that get baked into the compiled kernel — typically values used with `qd.static()`, such as loop bounds or feature flags:

```python
for i in qd.static(range(config.num_layers)):
...
```

Here the value of `num_layers` is compiled into the kernel. Concretely the loop will be unrolled, at compile time. If `num_layers` changes, a different kernel must be compiled.
Comment thread
duburcqa marked this conversation as resolved.

Mark such fields with `add_value_to_cache_key` so their values are included in the cache key:

```python
import dataclasses
from quadrants.lang._fast_caching import FIELD_METADATA_CACHE_VALUE

@dataclasses.dataclass
class SimConfig:
num_envs: int = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
dt: float = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
use_gravity: bool = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
```

With this annotation, changing `num_envs` from 100 to 200 produces a different cache key so the correct compiled kernel is looked up (or compiled if not yet cached). Without it, the wrong kernel could be loaded.

Note: `@qd.data_oriented` objects and `qd.Template` parameters already include primitive values in the cache key automatically — this annotation is only needed for `dataclasses.dataclass` fields.

## Constraints

A kernel is eligible for fastcache only if all of the following hold:

### 1. All data flows through parameters

The kernel must receive every piece of data it operates on as an explicit parameter. It must **not** capture variables from the enclosing Python scope (closures over fields, ndarrays, or mutable globals). This is the core "purity" constraint — the compiled kernel's behavior must be fully determined by its arguments.

```python
a = qd.ndarray(qd.f32, (10,))

# Not eligible: captures `a` from enclosing scope
@qd.kernel(fastcache=True)
def bad_kernel() -> None:
for i in range(10):
a[i] = 0.0 # raises QuadrantsCompilationError

# Eligible: `a` is passed as a parameter
@qd.kernel(fastcache=True)
def good_kernel(a: qd.types.NDArray[qd.f32, 1]) -> None:
for i in range(a.shape[0]):
a[i] = 0.0
```

Sub-functions called by the kernel are also checked — they must not capture external state either.
Comment thread
duburcqa marked this conversation as resolved.

**Exemptions:** The following may be accessed from the enclosing scope without violating purity:

| Allowed capture | Why |
|---|---|
| `enum.Enum` values (e.g. `MyEnum.VALUE`) | Named constants that are assumed not to vary between process runs. |
| `math` / `numpy` constants (e.g. `math.pi`) | Assumed stable across process runs. |
| Quadrants module attributes (e.g. `qd.simt.Tile16x16.SIZE`) | Part of the compiler's own API; assumed consistent with the Quadrants version hash. |

Other named constants (non-enum, non-module) captured from scope will raise a `QuadrantsCompilationError`, except for `UPPERCASE` names which emit a warning instead.

### 2. Supported parameter types

Fastcache supports the following parameter types:

| Type | Supported | Cache key includes |
|---|---|---|
| `qd.types.NDArray` (scalar, vector, matrix) | Yes | dtype, ndim, layout |
| `torch.Tensor` | Yes | dtype, ndim |
| `numpy.ndarray` | Yes | dtype, ndim |
| `dataclasses.dataclass` | Yes | field types recursively; field values if annotated with `add_value_to_cache_key` (see [above](#dataclass-fields-with-cached-values)) |
| `@qd.data_oriented` objects | Yes | member types and primitive member values recursively |
| `qd.Template` primitives (int, float, bool) | Yes | type and value (baked into kernel) |
| Non-template primitives (int, float, bool) | Yes | type only |
| `enum.Enum` | Yes | name and value |
| `qd.field` / `ScalarField` / `MatrixField` | **No** | — |

If any parameter is of an unsupported type, fastcache is silently disabled for that call and the kernel falls back to normal compilation. A warning is logged at the `warn` level identifying the offending parameter.

### 3. Source code must be available

Fastcache hashes the source code of the kernel and all sub-functions it calls. If the source file cannot be read at runtime (e.g. the kernel is defined in a frozen/compiled module, or the file has been deleted), fastcache cannot validate the cache and will fall back to normal compilation.

## Cache keying

Each compiled artifact is stored under a key derived from all of the following:

- The **Quadrants version** (`quadrants.__version__`).
- The **source code** of the kernel function or any `@qd.func` it calls.
- The **argument types** (e.g. switching from `f32` to `f64`, or changing ndarray dimensionality).
- The **compiler configuration** (e.g. `arch`, `debug`, `opt_level`, `fast_math`).
- **Template parameter values** (since they are baked into the compiled kernel).

When any of these change, the resulting key is different, so a new compilation occurs and a new entry is stored. Previous entries remain on disk — multiple cached versions coexist. You do not need to manually clear the cache when making code changes — the hash mismatch causes a transparent recompilation.

## Advanced

### Diagnostics

You can inspect whether fastcache was used for a specific kernel via the `src_ll_cache_observations` attribute on the kernel's primal:

```python
@qd.kernel(fastcache=True)
def my_kernel(x: qd.types.NDArray[qd.f32, 1]) -> None:
for i in range(x.shape[0]):
x[i] += 1.0

my_kernel(some_array)

obs = my_kernel._primal.src_ll_cache_observations
print(obs.cache_key_generated) # True if the cache key was computed
print(obs.cache_validated) # True if a cached entry was found and source hashes matched
print(obs.cache_loaded) # True if the compiled kernel was loaded from cache
print(obs.cache_stored) # True if the compiled kernel was stored to cache
```

On the first run you'll see `cache_stored=True` but `cache_loaded=False`. On the second run (after `qd.init`), `cache_loaded=True`.
1 change: 1 addition & 0 deletions docs/source/user_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ tile16
:maxdepth: 1
:titlesonly:

fastcache
graph
perf_dispatch
init_options
Expand Down
Loading