-
Notifications
You must be signed in to change notification settings - Fork 27
[Doc] Add user-facing fastcache documentation #597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
09edd30
[Doc] Add user-facing fastcache documentation
hughperkins c9aab5a
[Doc] Rewrite fastcache intro to focus on process-start cost
hughperkins ff39d7f
[Doc] Clarify "next run" to "next process run"
hughperkins 484046b
[Doc] Remove redundant hash mention from fastcache intro
hughperkins fca294f
[Doc] Trim fastcache intro sentence
hughperkins 10461bc
[Doc] Trim redundant phrase from fastcache intro
hughperkins 0c48cbd
[Doc] Remove implementation detail from fastcache intro
hughperkins 31eac7b
[Doc] Simplify fastcache benchmark table to match introduced concepts
hughperkins 334d444
[Doc] Simplify benchmark table row labels
hughperkins 2f1fcca
[Doc] Fix fastcache benchmark number to 0.3s
hughperkins ef1bc41
[Doc] Remove redundant "Why you want it" section
hughperkins 7457584
[Doc] Fix baseline to 4.6s (includes PTX cache)
hughperkins 2e69b45
[Doc] Trim fastcache intro
hughperkins 3a2e52d
[Doc] Minor grammar fix in fastcache intro
hughperkins f128429
[Doc] Remove redundant sentence before code example
hughperkins aa40b4b
Revert "[Doc] Remove redundant sentence before code example"
hughperkins 1732c7b
[Doc] Explain why fastcache is opt-in before the code example
hughperkins ef7f6fa
[Doc] Clarify fastcache applies to new process starts
hughperkins 3096be2
[Doc] Remove deprecated alternatives section
hughperkins 2d71816
[Doc] Move Diagnostics into Advanced section
hughperkins 5dc2c48
[Doc] Remove redundant "Fields not supported" section
hughperkins 05866ac
[Doc] Move cache invalidation to its own top-level section
hughperkins 833b8d7
[Doc] s/readable/available/ in constraint heading
hughperkins 7aecb4e
[Doc] Clarify data_oriented cache key includes primitive values
hughperkins 007d407
[Doc] Add section explaining add_value_to_cache_key for dataclass fields
hughperkins 212b142
[Doc] Clarify why type-only cache keys suffice for ndarrays
hughperkins b2e4e89
[Doc] Add qd.static example for add_value_to_cache_key motivation
hughperkins 8380891
[Doc] Move qd.static example into its own code block
hughperkins dc1a679
[Doc] Add loop unrolling clarification
hughperkins a6d4819
[Doc] Use concrete name instead of "that value"
hughperkins 35adbf4
[Doc] s/these/such/
hughperkins cef9f1b
[Doc] Fix incorrect claim about recompilation
hughperkins a37917e
[Doc] Clearer wording for missing cache key consequence
hughperkins 0acfa1f
[Doc] Add paragraph break before "Mark such fields"
hughperkins d0f06b2
[Doc] Add cross-reference to dataclass cached values section
hughperkins 309e9c9
[Doc] Add bool to non-template primitives row in fastcache table
c3621f0
[Doc] Rewrite cache invalidation as cache keying; clarify versions co…
37c0400
[Doc] Add purity exemptions table (enums, math/numpy, qd attrs)
c64ea07
[Doc] Fix enum exemption rationale
2aed791
[Doc] Soften enum rationale wording
875795e
[Doc] Soften math/numpy rationale wording
f8ec106
[Doc] Trim math/numpy rationale
d411723
[Doc] Remove 'Immutable' from enum rationale
9770e6a
[Doc] Soften quadrants attrs rationale wording
6f6b2dd
[Doc] Use Tile16x16.SIZE as quadrants attr example
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,174 @@ | ||
| # Fastcache | ||
|
|
||
| ## What it is | ||
|
|
||
| Fastcache reduces the time it takes to load cached kernels when a new Python process starts. | ||
|
|
||
| The standard [offline cache](init_options.md#offline_cache) already persists compiled kernels to disk. However, loading a cached kernel still requires parsing the kernel's Python AST and transforming it into IR. For applications with many kernels this front-end overhead alone can take several seconds. | ||
|
|
||
| Fastcache bypasses that front-end work. It computes a cheap cache key from the kernel source text, argument types, and compiler config, and uses it to load the compiled artifact directly. | ||
|
|
||
| On a Genesis simulator benchmark (`single_franka_envs.py`, Ubuntu 24.04, NVIDIA 5090): | ||
|
|
||
| | Configuration | Process-start kernel load time | | ||
| |---|---| | ||
| | Other caches (no fastcache) | 4.6 s | | ||
| | + fastcache | 0.3 s | | ||
|
|
||
| ## How to use it | ||
|
|
||
| ### Enabling fastcache on a kernel | ||
|
|
||
| Fastcache requires the kernel to be *pure*: all data it operates on must be passed as explicit parameters, with nothing captured from the enclosing Python scope (see [Constraints](#constraints) below). Because not all kernels satisfy this, fastcache is opt-in — you assert purity by adding `fastcache=True` to the `@qd.kernel` decorator: | ||
|
|
||
| ```python | ||
| import quadrants as qd | ||
|
|
||
| @qd.kernel(fastcache=True) | ||
| def my_kernel(a: qd.types.NDArray[qd.f32, 1], b: qd.types.NDArray[qd.f32, 1]) -> None: | ||
| for i in range(a.shape[0]): | ||
| b[i] = a[i] * 2.0 | ||
| ``` | ||
|
|
||
| That's it. On the first call, the kernel compiles normally and the fastcache entry is written to disk. When the next Python process starts, the cached artifact is loaded directly. | ||
|
|
||
| ### Runtime configuration | ||
|
|
||
| Fastcache requires the offline cache to be enabled (which it is by default). Two `qd.init` options are relevant: | ||
|
|
||
| | Option | Default | Effect | | ||
| |---|---|---| | ||
| | `src_ll_cache` | `True` | Master switch for fastcache. Set to `False` to disable it globally. | | ||
| | `print_non_pure` | `False` | When `True`, prints the name of every kernel at call time that is *not* marked `fastcache=True`. Useful for finding kernels you forgot to annotate. | | ||
|
|
||
| ```python | ||
| qd.init(arch=qd.gpu) | ||
| # Fastcache is on by default. To disable: | ||
| # qd.init(arch=qd.gpu, src_ll_cache=False) | ||
|
|
||
| # To find un-annotated kernels: | ||
| # qd.init(arch=qd.gpu, print_non_pure=True) | ||
| ``` | ||
|
|
||
| ## Dataclass fields with cached values | ||
|
|
||
| By default, for `dataclasses.dataclass` parameters, fastcache only includes the *types* of each field in the cache key, not their values. This is fine for fields like ndarrays, where the compiled kernel doesn't depend on the actual data, only the dtype and dimensionality. | ||
|
|
||
| However, some dataclass fields hold configuration values that get baked into the compiled kernel — typically values used with `qd.static()`, such as loop bounds or feature flags: | ||
|
|
||
| ```python | ||
| for i in qd.static(range(config.num_layers)): | ||
| ... | ||
| ``` | ||
|
|
||
| Here the value of `num_layers` is compiled into the kernel. Concretely the loop will be unrolled, at compile time. If `num_layers` changes, a different kernel must be compiled. | ||
|
|
||
| Mark such fields with `add_value_to_cache_key` so their values are included in the cache key: | ||
|
|
||
| ```python | ||
| import dataclasses | ||
| from quadrants.lang._fast_caching import FIELD_METADATA_CACHE_VALUE | ||
|
|
||
| @dataclasses.dataclass | ||
| class SimConfig: | ||
| num_envs: int = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True}) | ||
| dt: float = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True}) | ||
| use_gravity: bool = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True}) | ||
| ``` | ||
|
|
||
| With this annotation, changing `num_envs` from 100 to 200 produces a different cache key so the correct compiled kernel is looked up (or compiled if not yet cached). Without it, the wrong kernel could be loaded. | ||
|
|
||
| Note: `@qd.data_oriented` objects and `qd.Template` parameters already include primitive values in the cache key automatically — this annotation is only needed for `dataclasses.dataclass` fields. | ||
|
|
||
| ## Constraints | ||
|
|
||
| A kernel is eligible for fastcache only if all of the following hold: | ||
|
|
||
| ### 1. All data flows through parameters | ||
|
|
||
| The kernel must receive every piece of data it operates on as an explicit parameter. It must **not** capture variables from the enclosing Python scope (closures over fields, ndarrays, or mutable globals). This is the core "purity" constraint — the compiled kernel's behavior must be fully determined by its arguments. | ||
|
|
||
| ```python | ||
| a = qd.ndarray(qd.f32, (10,)) | ||
|
|
||
| # Not eligible: captures `a` from enclosing scope | ||
| @qd.kernel(fastcache=True) | ||
| def bad_kernel() -> None: | ||
| for i in range(10): | ||
| a[i] = 0.0 # raises QuadrantsCompilationError | ||
|
|
||
| # Eligible: `a` is passed as a parameter | ||
| @qd.kernel(fastcache=True) | ||
| def good_kernel(a: qd.types.NDArray[qd.f32, 1]) -> None: | ||
| for i in range(a.shape[0]): | ||
| a[i] = 0.0 | ||
| ``` | ||
|
|
||
| Sub-functions called by the kernel are also checked — they must not capture external state either. | ||
|
duburcqa marked this conversation as resolved.
|
||
|
|
||
| **Exemptions:** The following may be accessed from the enclosing scope without violating purity: | ||
|
|
||
| | Allowed capture | Why | | ||
| |---|---| | ||
| | `enum.Enum` values (e.g. `MyEnum.VALUE`) | Named constants that are assumed not to vary between process runs. | | ||
| | `math` / `numpy` constants (e.g. `math.pi`) | Assumed stable across process runs. | | ||
| | Quadrants module attributes (e.g. `qd.simt.Tile16x16.SIZE`) | Part of the compiler's own API; assumed consistent with the Quadrants version hash. | | ||
|
|
||
| Other named constants (non-enum, non-module) captured from scope will raise a `QuadrantsCompilationError`, except for `UPPERCASE` names which emit a warning instead. | ||
|
|
||
| ### 2. Supported parameter types | ||
|
|
||
| Fastcache supports the following parameter types: | ||
|
|
||
| | Type | Supported | Cache key includes | | ||
| |---|---|---| | ||
| | `qd.types.NDArray` (scalar, vector, matrix) | Yes | dtype, ndim, layout | | ||
| | `torch.Tensor` | Yes | dtype, ndim | | ||
| | `numpy.ndarray` | Yes | dtype, ndim | | ||
| | `dataclasses.dataclass` | Yes | field types recursively; field values if annotated with `add_value_to_cache_key` (see [above](#dataclass-fields-with-cached-values)) | | ||
| | `@qd.data_oriented` objects | Yes | member types and primitive member values recursively | | ||
| | `qd.Template` primitives (int, float, bool) | Yes | type and value (baked into kernel) | | ||
| | Non-template primitives (int, float, bool) | Yes | type only | | ||
| | `enum.Enum` | Yes | name and value | | ||
| | `qd.field` / `ScalarField` / `MatrixField` | **No** | — | | ||
|
|
||
| If any parameter is of an unsupported type, fastcache is silently disabled for that call and the kernel falls back to normal compilation. A warning is logged at the `warn` level identifying the offending parameter. | ||
|
|
||
| ### 3. Source code must be available | ||
|
|
||
| Fastcache hashes the source code of the kernel and all sub-functions it calls. If the source file cannot be read at runtime (e.g. the kernel is defined in a frozen/compiled module, or the file has been deleted), fastcache cannot validate the cache and will fall back to normal compilation. | ||
|
|
||
| ## Cache keying | ||
|
|
||
| Each compiled artifact is stored under a key derived from all of the following: | ||
|
|
||
| - The **Quadrants version** (`quadrants.__version__`). | ||
| - The **source code** of the kernel function or any `@qd.func` it calls. | ||
| - The **argument types** (e.g. switching from `f32` to `f64`, or changing ndarray dimensionality). | ||
| - The **compiler configuration** (e.g. `arch`, `debug`, `opt_level`, `fast_math`). | ||
| - **Template parameter values** (since they are baked into the compiled kernel). | ||
|
|
||
| When any of these change, the resulting key is different, so a new compilation occurs and a new entry is stored. Previous entries remain on disk — multiple cached versions coexist. You do not need to manually clear the cache when making code changes — the hash mismatch causes a transparent recompilation. | ||
|
|
||
| ## Advanced | ||
|
|
||
| ### Diagnostics | ||
|
|
||
| You can inspect whether fastcache was used for a specific kernel via the `src_ll_cache_observations` attribute on the kernel's primal: | ||
|
|
||
| ```python | ||
| @qd.kernel(fastcache=True) | ||
| def my_kernel(x: qd.types.NDArray[qd.f32, 1]) -> None: | ||
| for i in range(x.shape[0]): | ||
| x[i] += 1.0 | ||
|
|
||
| my_kernel(some_array) | ||
|
|
||
| obs = my_kernel._primal.src_ll_cache_observations | ||
| print(obs.cache_key_generated) # True if the cache key was computed | ||
| print(obs.cache_validated) # True if a cached entry was found and source hashes matched | ||
| print(obs.cache_loaded) # True if the compiled kernel was loaded from cache | ||
| print(obs.cache_stored) # True if the compiled kernel was stored to cache | ||
| ``` | ||
|
|
||
| On the first run you'll see `cache_stored=True` but `cache_loaded=False`. On the second run (after `qd.init`), `cache_loaded=True`. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -54,6 +54,7 @@ tile16 | |
| :maxdepth: 1 | ||
| :titlesonly: | ||
|
|
||
| fastcache | ||
| graph | ||
| perf_dispatch | ||
| init_options | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.