[Misc] Add coverage report to PRs, including kernels#470
Conversation
Enable pytest-cov in Linux CI and post coverage summary as a PR comment via MishaKav/pytest-coverage-comment. Uses only the built-in GITHUB_TOKEN — no external service or org permissions.
Use diff-cover to report percentage of changed/added lines that are covered by tests. Posts a sticky PR comment with both diff coverage and overall project coverage. No external services needed — uses only the built-in GITHUB_TOKEN.
Fail the Linux CI if less than 80% of changed/added Python lines are covered by tests. The coverage PR comment is posted before the check so numbers are always visible.
Use --format markdown:file.md syntax instead of --format markdown with stdout redirect.
Code inside @qd.func / @qd.kernel (and @ti. variants) is JIT-compiled to GPU code, so Python coverage.py can never trace it. Exclude these blocks to avoid false-negative coverage on kernel-heavy files.
When QD_KERNEL_COVERAGE=1, rewrite kernel/func Python ASTs to insert coverage probes (field stores) before each statement. The probes execute on the GPU and record which source lines were actually reached, including runtime if/else branches — not just static ones. At process exit, probe data is written to .coverage.kernel which can be merged with pytest-cov data via `coverage combine`. Zero C++ changes. Zero impact on the normal runtime path — the coverage module is only imported when the env var is set.
Track which Program instance the coverage field belongs to. Re-allocate after qd.init() destroys the old SNode tree, preventing dangling field references with garbage dimensions.
start_lineno=10 + relative line 2 - 1 = 11, not 12.
…it() The old flush() tried to read the field at atexit, but by then the runtime was already destroyed (test framework calls qd.reset()), causing to_numpy() to fail silently. Now we harvest probe data into _accumulated_lines whenever ensure_field_allocated() detects a program change, preserving results across reinitializations.
…ction Instead of trying to read the coverage field after the runtime is destroyed (which hangs on CUDA), install a hook on clear() that harvests probe data while the field is still alive. This fixes the hang when switching architectures (e.g. x64 -> cuda) in tests.
run_tests.py --coverage passes --cov-branch to pytest, producing branch/arc coverage data. Our .coverage.kernel was writing line-only data, causing "Can't combine branch coverage data with statement data". Now we detect branch mode and synthesize arcs from covered lines.
_is_branch_coverage() checked config files but --cov-branch is a CLI flag not in config. Now reads the actual .coverage file written by pytest-cov to detect arc mode. Also removes stale .coverage.kernel from previous runs to avoid "no such table: meta" errors.
Uses portable subgroup.shuffle instead of CUDA-specific warp.shfl, and qd.gpu arch so it runs on both CUDA and Vulkan.
The kernel reads flag[0] (a runtime field value) to choose between two shuffle paths, verifying that coverage correctly tracks which branch executed and which didn't.
qd.gpu doesn't match QD_WANTED_ARCHS=cuda filtering in test_utils.
The _qd_cov field is injected into global_vars for coverage instrumentation. Pure kernels flag all global_vars accesses as violations, causing compilation errors. Exempt _qd_cov so coverage works on all kernels including pure ones.
Set QD_KERNEL_COVERAGE=1 in the test script so kernel probes are actually injected during CI runs, and add a coverage combine step to merge .coverage.kernel into the main .coverage before generating reports. Also fix flush() to accumulate kernel data across multiple test phases instead of overwriting.
| # things, without doing full c++ build | ||
| build-backend = "setuptools.build_meta" | ||
|
|
||
| [tool.coverage.report] |
There was a problem hiding this comment.
what does this do?
There was a problem hiding this comment.
Opus:
That tells coverage.py to treat lines matching those regex patterns as excluded from coverage reporting — as if they had # pragma: no cover.
Specifically, it excludes the decorator lines @qd.func, @qd.kernel, @ti.func, and @ti.kernel. Those lines are never "executed" in the normal Python sense because the decorated
function body gets JIT-compiled rather than interpreted, so without this exclusion they'd show up as uncovered and drag down the coverage percentage.
There was a problem hiding this comment.
this seems superfluous now we removed all the exceptions...
Guard the import behind the env var check at call sites so the module is never loaded in normal operation. This lets _kernel_coverage use top-level imports instead of scattered lazy imports.
| /python/test_env | ||
| /CHANGELOG.md | ||
| /.coverage | ||
| /.coverage.* |
| # Phase 2: install torch, run only torch tests | ||
| pip install torch --index-url https://download.pytorch.org/whl/cpu | ||
| python tests/run_tests.py -v -r 3 -m needs_torch | ||
| python tests/run_tests.py -v -r 3 -m needs_torch --coverage --cov-append |
There was a problem hiding this comment.
what does --cov-append do?
There was a problem hiding this comment.
opus:
--cov-append tells pytest-cov not to erase the existing .coverage data file before the test run. Instead, the new coverage data is appended/merged into whatever's already
there.
By default, pytest-cov wipes .coverage at the start of each run. With --cov-append, you can run multiple test sessions and accumulate coverage across all of them into a single
combined report.
Typical use case: running tests in separate phases (e.g., field backend then ndarray backend) and merging the results into one coverage report at the end.
|
@claude review |
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, comment @claude review on this pull request to trigger a review.
|
@claude review |
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 84% |
| Overall project coverage | 72% |
Total: 444 lines, 71 missing, 84% covered
🟢 python/quadrants/lang/_func_base.py (82%)
🔴 7 import os
25
🔴 26 def _kernel_coverage_enabled() -> bool:
🟢 27 return os.environ.get("QD_KERNEL_COVERAGE") == "1"
28
29
🟢 255 _kcov = None
🟢 256 if _kernel_coverage_enabled() and autodiff_mode == _qd_core.AutodiffMode.NONE:
🟢 257 from . import ( # pylint: disable=import-outside-toplevel
258 _kernel_coverage as _kcov,
259 )
260
🟢 261 tree = _kcov.rewrite_ast(tree, function_source_info.filepath, function_source_info.start_lineno)
262
🟢 266 if _kcov is not None:
🟢 267 cov_field = _kcov.get_field()
🟢 268 if cov_field is not None:
🟢 269 global_vars[_kcov.FIELD_VAR_NAME] = cov_field
🟢 python/quadrants/lang/_kernel_coverage.py (80%)
1 """Kernel code coverage via Python AST rewriting.
2
3 When enabled (QD_KERNEL_COVERAGE=1), this module rewrites kernel and func ASTs to insert coverage probes — field
4 stores that record which source lines actually execute on the GPU. At process exit, the collected data is written
5 to a .coverage file compatible with coverage.py / pytest-cov / diff-cover.
6
7 The probes are compiled as ordinary field stores by the existing pipeline, so no C++ changes are needed. When
8 disabled, this module is never imported and has zero impact on the normal runtime path.
9 """
10
🟢 11 import ast
🟢 12 import atexit
🟢 13 import logging
🟢 14 import os
🟢 15 import threading
🟢 16 import warnings
🟢 17 from typing import TYPE_CHECKING
18
🟢 19 from coverage import CoverageData # type: ignore[import-not-found]
20
🟢 21 import quadrants as qd
🟢 22 from quadrants.lang import impl
23
24 if TYPE_CHECKING:
25 from quadrants.lang.field import ScalarField
26
🟢 27 FIELD_VAR_NAME = "_qd_cov"
🟢 28 _MAX_PROBES = int(os.environ.get("QD_COVERAGE_MAX_PROBES", "100000"))
29
🟢 30 _lock = threading.Lock()
🟢 31 _cov_field: "ScalarField | None" = None
🟢 32 _cov_field_prog: object | None = None # tracks which Program instance owns _cov_field
🟢 33 _probe_counter: int = 0
34 # {probe_id: (filepath, absolute_lineno)}
🟢 35 _probe_map: dict[int, tuple[str, int]] = {}
36 # Accumulated coverage lines surviving across qd.init() resets
🟢 37 _accumulated_lines: dict[str, set[int]] = {}
🟢 38 _reset_hook_installed: bool = False
39 # Directory for .coverage and _qd_kcov.* files, captured when coverage is first enabled
🟢 40 _coverage_dir: str | None = None
41
42
🟢 43 def _harvest_field() -> None:
44 """Read probe data from the current field into _accumulated_lines.
45
46 Must be called while the runtime is still alive (before clear()).
47 """
48 global _cov_field, _cov_field_prog
🟢 49 with _lock:
🟢 50 if _cov_field is None or not _probe_map:
🟢 51 return
🟢 52 field_ref = _cov_field
🟢 53 probe_snapshot = dict(_probe_map)
🟢 54 _cov_field = None
🟢 55 _cov_field_prog = None
🟢 56 try:
🟢 57 arr = field_ref.to_numpy()
🟢 58 except Exception:
🟢 59 logging.warning("Failed to read coverage field, coverage data for this session will be lost", exc_info=True)
🟢 60 return
🟢 61 with _lock:
🟢 62 for probe_id, (filepath, lineno) in probe_snapshot.items():
🟢 63 if probe_id < len(arr) and arr[probe_id] != 0:
🟢 64 _accumulated_lines.setdefault(filepath, set()).add(lineno)
65
66
🟢 67 def _install_reset_hook() -> None:
68 """Monkey-patch PyQuadrants.clear() to harvest probes before destruction."""
69 global _reset_hook_installed
🟢 70 if _reset_hook_installed:
🟢 71 return
🟢 72 _original_clear = impl.PyQuadrants.clear
73
🟢 74 def _hooked_clear(self) -> None:
🟢 75 _harvest_field()
🟢 76 _original_clear(self)
77
🟢 78 impl.PyQuadrants.clear = _hooked_clear # type: ignore[assignment]
🟢 79 _reset_hook_installed = True
80
81
🟢 82 def ensure_field_allocated() -> None:
83 """Allocate (or re-allocate after qd.init()) the global coverage field."""
84 global _cov_field, _cov_field_prog, _coverage_dir
🟢 85 _install_reset_hook()
🟢 86 if _coverage_dir is None:
🟢 87 _coverage_dir = os.getcwd()
🟢 88 current_prog = impl.get_runtime()._prog
🟢 89 if _cov_field is not None and _cov_field_prog is current_prog:
🟢 90 return
🟢 91 with _lock:
🟢 92 current_prog = impl.get_runtime()._prog
🟢 93 if _cov_field is not None and _cov_field_prog is current_prog:
🔴 94 return
🟢 95 _cov_field = qd.field(dtype=qd.i32, shape=(_MAX_PROBES,)) # type: ignore[assignment]
🟢 96 _cov_field_prog = current_prog
97
98
🟢 99 def get_field() -> "ScalarField | None":
🟢 100 with _lock:
🟢 101 if _cov_field_prog is not impl.get_runtime()._prog:
🔴 102 return None
🟢 103 return _cov_field
104
105
🟢 106 def rewrite_ast(tree: ast.Module, filepath: str, start_lineno: int) -> ast.Module:
107 """Rewrite a kernel/func AST to insert coverage probes.
108
109 Each executable statement at a new source line gets a probe: ``_qd_cov[<probe_id>] = 1``.
110 Probes inside if/else bodies only fire when that branch is taken, giving true runtime branch coverage.
111 """
112 global _probe_counter
🟢 113 with _lock:
🟢 114 rewriter = _CoverageASTRewriter(
115 field_name=FIELD_VAR_NAME,
116 filepath=filepath,
117 start_lineno=start_lineno,
118 probe_id_start=_probe_counter,
119 )
🟢 120 tree = rewriter.visit(tree)
🟢 121 ast.fix_missing_locations(tree)
🟢 122 _probe_counter = rewriter.next_probe_id
🟢 123 _probe_map.update(rewriter.probe_map)
🟢 124 return tree
125
126
🟢 127 def _detect_arc_mode() -> bool:
128 """Detect whether pytest-cov wrote branch (arc) data by reading .coverage.
129
130 Defaults to True (arc mode) when .coverage doesn't exist or is empty, since run_tests.py --coverage always
131 enables --cov-branch.
132 """
🔴 133 try:
🔴 134 cov_path = os.path.join(_coverage_dir, ".coverage") if _coverage_dir else ".coverage"
🔴 135 cd = CoverageData(basename=cov_path)
🔴 136 cd.read()
🔴 137 if not cd.measured_files():
🔴 138 return True
🔴 139 return cd.has_arcs()
🔴 140 except Exception:
🔴 141 logging.debug("Failed to detect arc mode from .coverage file, defaulting to arc mode", exc_info=True)
🔴 142 return True
143
144
🟢 145 def flush() -> None:
146 """Harvest any remaining field data and write all results to a .coverage file.
147
148 If .coverage.kernel already exists (e.g. from a prior test phase), the new data is merged into it so nothing
149 is lost across multiple invocations.
150 """
🔴 151 _harvest_field()
152
🔴 153 with _lock:
🔴 154 if not _accumulated_lines:
🔴 155 return
🔴 156 snapshot = {f: set(lines) for f, lines in _accumulated_lines.items()}
157
🔴 158 base_dir = _coverage_dir or os.getcwd()
🔴 159 kernel_path = os.path.join(base_dir, f"_qd_kcov.{os.getpid()}")
🔴 160 use_arcs = _detect_arc_mode()
161
🔴 162 cov = CoverageData(basename=kernel_path)
🔴 163 if use_arcs:
🔴 164 arcs_by_file: dict[str, list[tuple[int, int]]] = {}
🔴 165 for filepath, lines in snapshot.items():
166 # Emit only entry/exit arcs per line — we know which lines ran but not the actual transitions
167 # between them, so we avoid fabricating inter-line arcs that would misrepresent branch coverage.
🔴 168 arcs = []
🔴 169 for line in sorted(lines):
🔴 170 arcs.append((-1, line))
🔴 171 arcs.append((line, -1))
🔴 172 arcs_by_file[filepath] = arcs
🔴 173 cov.add_arcs(arcs_by_file)
174 else:
🔴 175 cov.add_lines({f: sorted(lines) for f, lines in snapshot.items()})
🔴 176 cov.write()
177
178
🟢 179 _capacity_warning_emitted = False
180
181
🟢 182 class _CoverageASTRewriter(ast.NodeTransformer):
183 """Insert coverage probes before each statement at a new source line."""
184
🟢 185 def __init__(self, field_name: str, filepath: str, start_lineno: int, probe_id_start: int) -> None:
🟢 186 self._field_name = field_name
🟢 187 self._filepath = filepath
🟢 188 self._start_lineno = start_lineno
🟢 189 self.next_probe_id = probe_id_start
🟢 190 self._seen_lines: set[int] = set()
🟢 191 self.probe_map: dict[int, tuple[str, int]] = {}
192
🟢 193 def _make_probe(self, abs_lineno: int, rel_lineno: int, col_offset: int) -> ast.Assign | None:
194 global _capacity_warning_emitted
🟢 195 probe_id = self.next_probe_id
🟢 196 if probe_id >= _MAX_PROBES:
🟢 197 if not _capacity_warning_emitted:
🟢 198 warnings.warn(
199 f"Kernel coverage probe capacity ({_MAX_PROBES}) exceeded. "
200 f"Additional kernel lines will not be tracked. "
201 f"Set QD_COVERAGE_MAX_PROBES to a higher value.",
202 stacklevel=2,
203 )
🟢 204 _capacity_warning_emitted = True
🟢 205 return None
🟢 206 self.probe_map[probe_id] = (self._filepath, abs_lineno)
🟢 207 self.next_probe_id += 1
🟢 208 node = ast.Assign(
209 targets=[
210 ast.Subscript(
211 value=ast.Name(id=self._field_name, ctx=ast.Load()),
212 slice=ast.Constant(value=probe_id),
213 ctx=ast.Store(),
214 )
215 ],
216 value=ast.Constant(value=1),
217 lineno=rel_lineno,
218 col_offset=col_offset,
219 end_lineno=rel_lineno,
220 end_col_offset=col_offset,
221 )
🟢 222 return node
223
🟢 224 def _instrument_body(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
🟢 225 result: list[ast.stmt] = []
🟢 226 for stmt in stmts:
🟢 227 rel_lineno = getattr(stmt, "lineno", None)
🟢 228 if rel_lineno is not None:
🟢 229 abs_lineno = rel_lineno + self._start_lineno - 1
🟢 230 if abs_lineno not in self._seen_lines:
🟢 231 self._seen_lines.add(abs_lineno)
🟢 232 col = getattr(stmt, "col_offset", 0)
🟢 233 probe = self._make_probe(abs_lineno, rel_lineno, col)
🟢 234 if probe is not None:
🟢 235 result.append(probe)
🟢 236 result.append(self.visit(stmt))
🟢 237 return result
238
🟢 239 def visit_FunctionDef(self, node: ast.FunctionDef) -> ast.FunctionDef:
🟢 240 node.body = self._instrument_body(node.body)
🟢 241 return node
242
🟢 243 def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> ast.AsyncFunctionDef:
🔴 244 node.body = self._instrument_body(node.body)
🔴 245 return node
246
🟢 247 def visit_If(self, node: ast.If) -> ast.If:
🟢 248 node.body = self._instrument_body(node.body)
🟢 249 if node.orelse:
🟢 250 node.orelse = self._instrument_body(node.orelse)
🟢 251 return node
252
🟢 253 def visit_For(self, node: ast.For) -> ast.For:
🟢 254 node.body = self._instrument_body(node.body)
🟢 255 if node.orelse:
🟢 256 node.orelse = self._instrument_body(node.orelse)
🟢 257 return node
258
🟢 259 def visit_While(self, node: ast.While) -> ast.While:
🟢 260 node.body = self._instrument_body(node.body)
🟢 261 if node.orelse:
🟢 262 node.orelse = self._instrument_body(node.orelse)
🟢 263 return node
264
🟢 265 def visit_With(self, node: ast.With) -> ast.With:
🟢 266 node.body = self._instrument_body(node.body)
🟢 267 return node
268
🟢 269 def visit_Try(self, node: ast.Try) -> ast.Try:
🟢 270 node.body = self._instrument_body(node.body)
🟢 271 for handler in node.handlers:
🟢 272 handler.body = self._instrument_body(handler.body)
🟢 273 if node.orelse:
🟢 274 node.orelse = self._instrument_body(node.orelse)
🟢 275 if node.finalbody:
🟢 276 node.finalbody = self._instrument_body(node.finalbody)
🟢 277 return node
278
279
🟢 280 atexit.register(flush)
🟢 python/quadrants/lang/ast/ast_transformer_utils.py (100%)
🟢 335 if not name.startswith("_qd_"):
🟢 336 reason = f"{name} is in global vars, therefore violates pure"
🟢 337 violates_pure = True
🟢 python/quadrants/lang/kernel.py (80%)
19
🔴 20 def _kernel_coverage_enabled() -> bool:
🟢 21 return os.environ.get("QD_KERNEL_COVERAGE") == "1"
22
23
🟢 382 if _kernel_coverage_enabled():
🟢 383 from . import _kernel_coverage # pylint: disable=import-outside-toplevel
384
🟢 385 _kernel_coverage.ensure_field_allocated()
386
🟢 python/quadrants/lang/misc.py (100%)
496
🟢 497 if os.environ.get("QD_KERNEL_COVERAGE") == "1":
🟢 498 from . import _kernel_coverage # pylint: disable=import-outside-toplevel
499
🟢 500 _kernel_coverage.ensure_field_allocated()
501
🔴 python/quadrants/pytest_plugin.py (50%)
1 """Pytest plugin that auto-enables kernel coverage when pytest-cov is active.
2
3 Registered via the ``pytest11`` entry point so it loads automatically when quadrants is installed.
4 Opt out by setting ``QD_KERNEL_COVERAGE=0`` explicitly.
5 """
6
🔴 7 import os
8
9
🔴 10 def pytest_configure(config):
🟢 11 if config.pluginmanager.hasplugin("_cov"):
🟢 12 os.environ.setdefault("QD_KERNEL_COVERAGE", "1")
13 # Kernel coverage always writes arc-format data; ensure pytest-cov matches to avoid
14 # "Can not mix line and arc data" errors during coverage combine.
🟢 15 if not config.option.__dict__.get("cov_branch", False):
🔴 16 config.option.cov_branch = True
🟢 tests/python/quadrants/lang/fast_caching/test_src_ll_cache.py (100%)
11
🟢 12 _KERNEL_COVERAGE = os.environ.get("QD_KERNEL_COVERAGE") == "1"
🟢 67 @pytest.mark.skipif(
68 _KERNEL_COVERAGE,
69 reason="Coverage probes change LLVM IR addresses after reinit, breaking recompile comparison",
70 )
🟢 tests/python/quadrants/lang/test_kernel_impl.py (100%)
🟢 1 import os
🟢 11 _KERNEL_COVERAGE = os.environ.get("QD_KERNEL_COVERAGE") == "1"
13
🟢 14 @pytest.mark.skipif(
15 _KERNEL_COVERAGE,
16 reason="Coverage probes change the kernel AST, preventing FE-LL cache hits after reinit",
17 )
🟢 tests/python/test_api.py (100%)
🟢 438 actual = sorted([s for s in dir(src) if not s.startswith(("_", "@")) and s != "pytest_plugin"])
🟢 tests/python/test_intrinsics.py (100%)
55 (thread i does (i+1)*200000). Asserts strict monotonicity across threads and that
🟢 91 assert a[i - 1] < a[i] < a[i + 1]
🟢 tests/python/test_kernel_coverage.py (87%)
1 """Tests for kernel code coverage instrumentation.
2
3 These tests verify that the AST rewriter correctly inserts coverage probes and that the probes fire when kernel
4 code executes on the device.
5 """
6
🟢 7 import ast
🟢 8 import os
🟢 9 import textwrap
10
🟢 11 import pytest
12
🟢 13 import quadrants as qd
14
🟢 15 from tests import test_utils
16
17 # These tests only run when QD_KERNEL_COVERAGE=1
🟢 18 pytestmark = pytest.mark.skipif(
19 os.environ.get("QD_KERNEL_COVERAGE", "") != "1",
20 reason="QD_KERNEL_COVERAGE=1 not set",
21 )
22
23
24 # ---------------------------------------------------------------------------
25 # AST rewriter unit tests
26 # ---------------------------------------------------------------------------
27
🟢 28 _AST_REWRITER_CASES = [
29 pytest.param(
30 """\
31 def f():
32 x = 1
33 y = 2
34 return x + y
35 """,
36 {11, 12, 13},
37 10,
38 id="straight_line",
39 ),
40 pytest.param(
41 """\
42 def f():
43 if x > 0:
44 a = 1
45 else:
46 b = 2
47 """,
48 {2, 3, 5},
49 1,
50 id="if_else",
51 ),
52 pytest.param(
53 """\
54 def f():
55 for i in range(10):
56 x = i
57 """,
58 {2, 3},
59 1,
60 id="for_loop",
61 ),
62 pytest.param(
63 """\
64 def f():
65 while x > 0:
66 x = x - 1
67 else:
68 y = 0
69 """,
70 {2, 3, 5},
71 1,
72 id="while_loop_else",
73 ),
74 pytest.param(
75 """\
76 def f():
77 with ctx:
78 a = 1
79 b = 2
80 """,
81 {2, 3, 4},
82 1,
83 id="with_statement",
84 ),
85 pytest.param(
86 """\
87 def f():
88 try:
89 a = 1
90 except:
91 b = 2
92 else:
93 c = 3
94 finally:
95 d = 4
96 """,
97 {3, 5, 7, 9},
98 1,
99 id="try_except_finally",
100 ),
101 ]
102
103
🟢 104 @pytest.mark.parametrize("src,expected_lines,start_lineno", _AST_REWRITER_CASES)
🟢 105 def test_ast_rewriter(src, expected_lines, start_lineno):
106 """Verify the AST rewriter inserts probes at the expected source lines."""
🟢 107 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
108
🟢 109 tree = ast.parse(textwrap.dedent(src))
🟢 110 rewriter = _CoverageASTRewriter(
111 field_name="_qd_cov", filepath="test.py", start_lineno=start_lineno, probe_id_start=0
112 )
🟢 113 rewriter.visit(tree)
114
🟢 115 covered_lines = {lineno for _, (_, lineno) in rewriter.probe_map.items()}
🟢 116 assert expected_lines.issubset(covered_lines), f"Expected lines {expected_lines} to be probed, got {covered_lines}"
117
118
🟢 119 def test_ast_rewriter_capacity_limit():
120 """Verify that probes stop being inserted when the capacity limit is hit."""
🟢 121 import warnings
122
🟢 123 import quadrants.lang._kernel_coverage as kcov
🟢 124 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
125
🟢 126 src = textwrap.dedent(
127 """\
128 def f():
129 a = 1
130 b = 2
131 c = 3
132 """
133 )
🟢 134 tree = ast.parse(src)
🟢 135 old_warning_state = kcov._capacity_warning_emitted
🟢 136 kcov._capacity_warning_emitted = False
🟢 137 try:
🟢 138 with warnings.catch_warnings(record=True) as w:
🟢 139 warnings.simplefilter("always")
🟢 140 rewriter = _CoverageASTRewriter(
141 field_name="_qd_cov", filepath="test.py", start_lineno=1, probe_id_start=kcov._MAX_PROBES - 1
142 )
🟢 143 rewriter.visit(tree)
144
🟢 145 assert rewriter.next_probe_id == kcov._MAX_PROBES
🟢 146 assert len(rewriter.probe_map) == 1, f"Only 1 probe should fit, got {len(rewriter.probe_map)}"
🟢 147 assert len(w) == 1
🟢 148 assert "exceeded" in str(w[0].message).lower()
149 finally:
🟢 150 kcov._capacity_warning_emitted = old_warning_state
151
152
🟢 153 def test_ast_rewriter_deduplicates_same_line():
154 """Verify that two statements on the same source line get only one probe."""
🟢 155 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
156
🟢 157 src = "def f():\n a = 1; b = 2\n"
🟢 158 tree = ast.parse(src)
🟢 159 rewriter = _CoverageASTRewriter(field_name="_qd_cov", filepath="test.py", start_lineno=1, probe_id_start=0)
🟢 160 rewriter.visit(tree)
161
🟢 162 abs_lines = [lineno for _, (_, lineno) in rewriter.probe_map.items()]
🟢 163 assert abs_lines.count(2) == 1, f"Line 2 should have exactly one probe, got {abs_lines.count(2)}"
164
165
🟢 166 def test_env_var_max_probes():
167 """Verify that QD_COVERAGE_MAX_PROBES env var is read at import time."""
🟢 168 import quadrants.lang._kernel_coverage as kcov
169
🟢 170 assert kcov._MAX_PROBES == int(os.environ.get("QD_COVERAGE_MAX_PROBES", "100000"))
171
172
🟢 173 def test_harvest_field_exception_path():
174 """Verify that _harvest_field handles to_numpy() failure gracefully."""
🟢 175 from unittest.mock import MagicMock
176
🟢 177 import quadrants.lang._kernel_coverage as kcov
178
🟢 179 old_field = kcov._cov_field
🟢 180 old_prog = kcov._cov_field_prog
🟢 181 old_map = kcov._probe_map.copy()
🟢 182 try:
🟢 183 mock_field = MagicMock()
🟢 184 mock_field.to_numpy.side_effect = RuntimeError("runtime destroyed")
🟢 185 kcov._cov_field = mock_field
🟢 186 kcov._cov_field_prog = object()
🟢 187 kcov._probe_map[999999] = ("fake.py", 1)
188
189 # Should not raise — the exception is caught and logged
🟢 190 kcov._harvest_field()
191
🟢 192 assert kcov._cov_field is None, "Field should be cleared after failure"
🟢 193 assert kcov._cov_field_prog is None, "Field prog should be cleared after failure"
194 finally:
🟢 195 kcov._cov_field = old_field
🟢 196 kcov._cov_field_prog = old_prog
🟢 197 kcov._probe_map = old_map
198
199
200 # ---------------------------------------------------------------------------
201 # End-to-end tests
202 # ---------------------------------------------------------------------------
203
204
🟢 205 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 206 def test_kernel_coverage_branches_e2e():
207 """Verify that only the taken branch has its probe fired."""
🟢 208 from quadrants.lang import _kernel_coverage
209
🟢 210 _kernel_coverage.ensure_field_allocated()
211
🟢 212 probe_count_before = _kernel_coverage._probe_counter
🟢 213 out = qd.field(dtype=qd.i32, shape=(1,))
214
🟢 215 @qd.kernel
🟢 216 def branching_kernel():
🟢 217 x = 10
🟢 218 if x > 5:
🟢 219 out[0] = 1
220 else:
🔴 221 out[0] = 2
222
🟢 223 branching_kernel()
224
🟢 225 assert out[0] == 1
226
🟢 227 cov_field = _kernel_coverage.get_field()
🟢 228 arr = cov_field.to_numpy()
229
🟢 230 probes_for_kernel = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
231
🟢 232 taken_probes = {pid for pid, loc in probes_for_kernel.items() if arr[pid] != 0}
🟢 233 not_taken_probes = {pid for pid, loc in probes_for_kernel.items() if arr[pid] == 0}
234
🟢 235 assert len(taken_probes) > 0, "At least some probes should have fired"
🟢 236 assert len(not_taken_probes) > 0, "The else branch should not have been reached"
237
238
🟢 239 @test_utils.test(arch=qd.gpu)
🟢 240 def test_kernel_coverage_simt_e2e():
241 """Verify coverage probes track branches with block.sync() and subgroup shuffle.
242
243 The if/else is based on a runtime value read from a field, so the compiler cannot constant-fold it away.
244 Only the taken branch's shuffle probe should fire.
245 """
🔴 246 from quadrants.lang import _kernel_coverage
🔴 247 from quadrants.lang.simt import subgroup
248
🔴 249 _kernel_coverage.ensure_field_allocated()
250
🔴 251 N = 64
🔴 252 probe_count_before = _kernel_coverage._probe_counter
🔴 253 flag = qd.field(dtype=qd.i32, shape=(1,))
🔴 254 a = qd.field(dtype=qd.i32, shape=(N,))
🔴 255 out = qd.field(dtype=qd.i32, shape=(N,))
256
🔴 257 flag[0] = 1 # runtime value: take the if-branch
258
🔴 259 @qd.kernel
🔴 260 def simt_kernel():
🔴 261 qd.loop_config(block_dim=N)
🔴 262 for i in range(N):
🔴 263 a[i] = i + 1
🔴 264 qd.simt.block.sync()
🔴 265 if flag[0] > 0:
🔴 266 val = subgroup.shuffle(a[i], qd.u32(0))
🔴 267 out[i] = val
268 else:
🔴 269 val = subgroup.shuffle(a[i], qd.u32(1))
🔴 270 out[i] = val + 100
271
🔴 272 simt_kernel()
273
🔴 274 for i in range(4):
🔴 275 assert out[i] == 1, f"Expected 1 at index {i}, got {out[i]}"
276
🔴 277 cov_field = _kernel_coverage.get_field()
🔴 278 arr = cov_field.to_numpy()
279
🔴 280 probes_for_kernel = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
281
🔴 282 fired = {pid for pid in probes_for_kernel if arr[pid] != 0}
🔴 283 not_fired = {pid for pid in probes_for_kernel if arr[pid] == 0}
🔴 284 assert len(fired) >= 4, f"Expected at least 4 probes to fire, got {len(fired)}"
🔴 285 assert len(not_fired) >= 2, "The else branch should not have been reached"
286
287
🟢 288 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 289 def test_kernel_coverage_survives_reinit():
290 """Verify that coverage data accumulated before qd.init() reset is preserved.
291
292 Runs a kernel, then resets via qd.reset()/qd.init() (which triggers the _hooked_clear harvest), runs another
293 kernel, harvests again, and checks that _accumulated_lines contains data from both sessions.
294 """
🟢 295 from quadrants.lang import _kernel_coverage, impl
296
🟢 297 current_arch = impl.get_runtime()._arch
🟢 298 _kernel_coverage.ensure_field_allocated()
299
🟢 300 probe_count_before = _kernel_coverage._probe_counter
🟢 301 out1 = qd.field(dtype=qd.i32, shape=(1,))
302
🟢 303 @qd.kernel
🟢 304 def kernel_before_reset():
🟢 305 out1[0] = 1
306
🟢 307 kernel_before_reset()
308
🟢 309 cov_field = _kernel_coverage.get_field()
🟢 310 assert cov_field is not None
🟢 311 arr = cov_field.to_numpy()
🟢 312 probes_first = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 313 fired_first = {pid for pid in probes_first if arr[pid] != 0}
🟢 314 assert len(fired_first) > 0, "Probes from first kernel should have fired"
315
316 # Don't call _harvest_field() manually — let qd.reset() trigger it via the _hooked_clear hook
🟢 317 qd.reset()
318
319 # Verify the hook harvested data from the first session
🟢 320 files_before = set(_kernel_coverage._accumulated_lines.keys())
🟢 321 assert len(files_before) > 0, "Hook should have harvested data during reset"
🟢 322 lines_before = {}
🟢 323 for f, lines in _kernel_coverage._accumulated_lines.items():
🟢 324 lines_before[f] = set(lines)
325
🟢 326 qd.init(arch=current_arch)
327
🟢 328 _kernel_coverage.ensure_field_allocated()
329
🟢 330 probe_count_mid = _kernel_coverage._probe_counter
🟢 331 out2 = qd.field(dtype=qd.i32, shape=(1,))
332
🟢 333 @qd.kernel
🟢 334 def kernel_after_reset():
🟢 335 out2[0] = 2
336
🟢 337 kernel_after_reset()
338
🟢 339 _kernel_coverage._harvest_field()
340
🟢 341 for f in files_before:
🟢 342 assert (
343 f in _kernel_coverage._accumulated_lines
344 ), f"File {f} from before reset should still be in _accumulated_lines"
🟢 345 assert lines_before[f].issubset(
346 _kernel_coverage._accumulated_lines[f]
347 ), "Lines from before reset should be preserved"
348
🟢 349 probes_second = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_mid}
🟢 350 second_files = {loc[0] for loc in probes_second.values()}
🟢 351 for f in second_files:
🟢 352 assert f in _kernel_coverage._accumulated_lines, f"File {f} from second kernel should be in _accumulated_lines"
353
354
🟢 355 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 356 def test_kernel_coverage_autodiff():
357 """Verify that autodiff forward pass produces probes but backward does not.
358
359 The forward compilation (AutodiffMode.NONE) should insert probes that fire. The backward compilation
360 (AutodiffMode.REVERSE) should not add any probes.
361 """
🟢 362 from quadrants.lang import _kernel_coverage
363
🟢 364 _kernel_coverage.ensure_field_allocated()
365
🟢 366 x = qd.field(dtype=qd.f32, shape=(), needs_grad=True)
🟢 367 loss = qd.field(dtype=qd.f32, shape=(), needs_grad=True)
368
🟢 369 @qd.kernel
🟢 370 def compute():
🟢 371 loss[None] = x[None] * x[None]
372
🟢 373 x[None] = 5.0
374
🟢 375 probe_count_before = _kernel_coverage._probe_counter
376
🟢 377 with qd.ad.Tape(loss):
🟢 378 compute()
379
🟢 380 probe_count_after_tape = _kernel_coverage._probe_counter
🟢 381 forward_probes = probe_count_after_tape - probe_count_before
🟢 382 assert forward_probes > 0, "Forward compilation should have inserted probes"
383
384 # Verify forward probes actually fired
🟢 385 cov_field = _kernel_coverage.get_field()
🟢 386 assert cov_field is not None
🟢 387 arr = cov_field.to_numpy()
🟢 388 probes = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 389 fired = {pid for pid in probes if arr[pid] != 0}
🟢 390 assert len(fired) > 0, "Forward pass inside Tape should produce fired coverage probes"
391
392 # Verify backward pass computes correct gradients
🟢 393 assert loss[None] == pytest.approx(25.0)
🟢 394 assert x.grad[None] == pytest.approx(10.0)
395
396
🟢 397 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 398 def test_kernel_coverage_qd_func():
399 """Verify that probes fire inside a @qd.func called from a kernel."""
🟢 400 from quadrants.lang import _kernel_coverage
401
🟢 402 _kernel_coverage.ensure_field_allocated()
403
🟢 404 probe_count_before = _kernel_coverage._probe_counter
🟢 405 out = qd.field(dtype=qd.i32, shape=(1,))
406
🟢 407 @qd.func
🟢 408 def helper():
🟢 409 out[0] = 99
410
🟢 411 @qd.kernel
🟢 412 def caller():
🟢 413 helper()
414
🟢 415 caller()
416
🟢 417 assert out[0] == 99
418
🟢 419 cov_field = _kernel_coverage.get_field()
🟢 420 assert cov_field is not None
🟢 421 arr = cov_field.to_numpy()
422
🟢 423 probes = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 424 fired = {pid for pid in probes if arr[pid] != 0}
425 # The kernel body has one statement (helper()), and the func body has one (out[0] = 99).
426 # Both should produce probes that fire.
🟢 427 assert (
428 len(fired) >= 2
429 ), f"Expected probes from both kernel and func to fire, got {len(fired)} fired out of {len(probes)}"
430
431
🟢 432 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 433 def test_kernel_coverage_multiple_kernels_same_session():
434 """Verify that probes from two different kernels both fire in the same session."""
🟢 435 from quadrants.lang import _kernel_coverage
436
🟢 437 _kernel_coverage.ensure_field_allocated()
438
🟢 439 probe_count_before = _kernel_coverage._probe_counter
🟢 440 a = qd.field(dtype=qd.i32, shape=(1,))
🟢 441 b = qd.field(dtype=qd.i32, shape=(1,))
442
🟢 443 @qd.kernel
🟢 444 def kernel_a():
🟢 445 a[0] = 10
446
🟢 447 @qd.kernel
🟢 448 def kernel_b():
🟢 449 b[0] = 20
450
🟢 451 kernel_a()
🟢 452 probe_count_after_a = _kernel_coverage._probe_counter
🟢 453 kernel_b()
454
🟢 455 assert a[0] == 10
🟢 456 assert b[0] == 20
457
🟢 458 cov_field = _kernel_coverage.get_field()
🟢 459 arr = cov_field.to_numpy()
460
🟢 461 probes_a = {
462 pid: loc for pid, loc in _kernel_coverage._probe_map.items() if probe_count_before <= pid < probe_count_after_a
463 }
🟢 464 probes_b = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_after_a}
465
🟢 466 fired_a = {pid for pid in probes_a if arr[pid] != 0}
🟢 467 fired_b = {pid for pid in probes_b if arr[pid] != 0}
468
🟢 469 assert len(fired_a) > 0, "Probes from kernel_a should have fired"
🟢 470 assert len(fired_b) > 0, "Probes from kernel_b should have fired"
471
472
🟢 473 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 474 def test_qd_prefix_exemption_pure_kernel():
475 """Verify that _qd_-prefixed globals don't violate pure kernel checks.
476
477 With kernel coverage enabled, _qd_cov is injected as a global. This test verifies that a pure (fastcache)
478 kernel still compiles without error. The kernel uses ndarray arguments (not global fields) because pure
479 kernels prohibit non-_qd_ globals.
480 """
🟢 481 a = qd.ndarray(qd.i32, (1,))
482
🟢 483 @qd.kernel(fastcache=True)
🟢 484 def pure_kernel(arr: qd.types.NDArray) -> None:
🟢 485 arr[0] = 42
486
🟢 487 pure_kernel(a)
🟢 488 assert a[0] == 42
🟢 tests/python/test_offline_cache.py (100%)
16 # Coverage field allocation creates internal fill kernels that change cache file counts.
17 # CI runs these tests in a separate phase without QD_KERNEL_COVERAGE (see 4_test.sh).
🟢 18 pytestmark = pytest.mark.skipif(
19 os.environ.get("QD_KERNEL_COVERAGE") == "1",
20 reason="Kernel coverage adds internal kernels that invalidate cache file count assertions",
21 )
22
🟢 tests/python/test_snode_layout_inspection.py (100%)
🟢 1 import os
2
🟢 3 import pytest
4
🟢 10 @pytest.mark.skipif(
11 os.environ.get("QD_KERNEL_COVERAGE") == "1",
12 reason="Kernel coverage field on root shifts offset assertions",
13 )
run_tests.py auto-prepends "test_" to filenames that don't start with it, so passing the full path "tests/python/test_kernel_coverage.py" was mangled to "test_tests/python/test_kernel_coverage.py". Made-with: Cursor
Without this, toggling coverage between runs serves a stale cached kernel: coverage-off cached kernel gives 0% coverage when re-enabled, and coverage-on cached kernel causes NameError on _qd_cov when disabled.
The previous approach of setting config.option.cov_branch in pytest_configure was too late -- pytest-cov had already initialized its Coverage object in an earlier hook. Instead, the plugin now reads the cov_branch option and sets _QD_KCOV_ARC env var, which _detect_arc_mode() checks first. This avoids "Can not mix line and arc data" errors when users run pytest --cov without --cov-branch. Also change the default from arc to line mode when nothing is known, since pytest --cov without --cov-branch is more common.
|
@claude review |
| def rewrite_ast(tree: ast.Module, filepath: str, start_lineno: int) -> ast.Module: | ||
| """Rewrite a kernel/func AST to insert coverage probes. | ||
|
|
||
| Each executable statement at a new source line gets a probe: ``_qd_cov[<probe_id>] = 1``. | ||
| Probes inside if/else bodies only fire when that branch is taken, giving true runtime branch coverage. | ||
| """ | ||
| global _probe_counter | ||
| with _lock: | ||
| rewriter = _CoverageASTRewriter( | ||
| field_name=FIELD_VAR_NAME, | ||
| filepath=filepath, | ||
| start_lineno=start_lineno, | ||
| probe_id_start=_probe_counter, | ||
| ) | ||
| tree = rewriter.visit(tree) | ||
| ast.fix_missing_locations(tree) | ||
| _probe_counter = rewriter.next_probe_id | ||
| _probe_map.update(rewriter.probe_map) | ||
| return tree |
There was a problem hiding this comment.
🔴 Cached kernel IR includes baked-in _qd_cov[N] = 1 stores where N comes from the per-process monotonic _probe_counter, so probe_ids are not stable across processes. The new kcov cache-key partition in src_hasher.create_cache_key (src_hasher.py:53) correctly separates on/off, but within kcov=on two processes with identical source produce IR whose literal probe_ids reflect the first process's compilation order; when Process B loads Process A's cached IR with @qd.kernel(fastcache=True) + offline_cache=True, launches write to Process B's _probe_map[0..N] slots that may belong to a completely different kernel — producing silent false positives (another kernel's lines marked covered) and false negatives (the running kernel's lines marked uncovered). The same mechanism means QD_COVERAGE_MAX_PROBES is also absent from the cache key, so shrinking the cap between runs can load IR whose baked-in probe_id exceeds the new field size, producing OOB device writes. Fix: derive probe_ids deterministically from hash((filepath, abs_lineno)) with a module-level (filepath, lineno) -> probe_id dedup dict, and append str(_MAX_PROBES) to the hash_iterable_strings tuple. CI misses this because test_offline_cache.py is skipped under QD_KERNEL_COVERAGE=1.
Extended reasoning...
What the bug is and how it manifests
_CoverageASTRewriter._make_probe (python/quadrants/lang/_kernel_coverage.py:193-222) produces an ast.Assign node whose subscript is ast.Constant(value=probe_id). The compiled kernel IR therefore contains the literal probe_id value from whichever process first rewrote the AST. Probe IDs come from the module-level _probe_counter (lines 33, 112-123), which is per-process and monotonically increasing: the same kernel gets different probe_ids depending on which kernels materialized first in this process.
On top of that, src_hasher.create_cache_key (src_hasher.py:44-55) now appends "kcov" when QD_KERNEL_COVERAGE=1, correctly partitioning the cache between coverage-on and coverage-off — but the tuple contains no component reflecting the current probe_counter state, _probe_map contents, or _MAX_PROBES. Two processes with identical source + same kcov flag produce the same fastcache key, so Process B can load an IR that was compiled by Process A with Process A-specific probe_id literals.
The specific code path that triggers it
- Process A compiles kernels K1, K2 in that order with
QD_KERNEL_COVERAGE=1.rewrite_astassigns probe_ids[0, 1, 2]to K1 and[3, 4, 5]to K2. The C++ IR cache (viaoffline_cache+offline_cache_file_path) stores the compiled kernels keyed by fastcache key underkernel_compilation_manager.cpp:277-280(MemAndDiskCache mode). Python-sidesrc_hasher.storewrites(src_key → frontend_cache_key). - Process B starts with the same sources. A test happens to materialize K2 first (e.g. different pytest-xdist sharding, different test selection).
rewrite_astassigns probe_ids[0, 1, 2]to K2 and[3, 4, 5]to K1 in Process B's_probe_map. Kernel.materialize()calls_try_load_fastcache(kernel.py:335-373) for K1.create_cache_keyproduces the same key as Process A (since kcov marker, source, args, config all match).loadreturns Process A'sfrontend_cache_key, andprog.load_fast_cachefetches the IR compiled in Process A — with_qd_cov[0] = 1,_qd_cov[1] = 1,_qd_cov[2] = 1baked in.materializestill callsget_tree_and_ctx, which at_func_base.py:249-257unconditionally calls_kcov.rewrite_astand bumps_probe_counter/_probe_mapbased on Process B's state. But atkernel.py:420,only_parse_function_def=self.compiled_kernel_data_by_key.get(key) is not NoneisTrueon a cache hit, andfunction_def_transformer.pyreturns before processing the body — the freshly rewritten AST is discarded and the cached IR is what runs at launch.- At K1's launch in Process B, the cached IR writes
_qd_cov[0] = 1,_qd_cov[1] = 1,_qd_cov[2] = 1. In Process B's_probe_map, indices[0, 1, 2]point to K2's source lines, not K1's. - At harvest:
arr[0..2]are nonzero → K2's lines marked covered even though K2 may not have run (false positives).arr[3..5]are zero → K1's lines marked uncovered even though K1 actually ran (false negatives).
Step-by-step proof with a concrete example
Suppose the codebase has two kernels, K1 (3 instrumented lines at file.py:10-12) and K2 (3 instrumented lines at file.py:20-22):
| Process | Compile order | probe_ids for K1 | probe_ids for K2 | _probe_map |
|---|---|---|---|---|
| A | K1, K2 | 0, 1, 2 | 3, 4, 5 | 0→L10, 1→L11, 2→L12, 3→L20, 4→L21, 5→L22 |
| B | K2, K1 | 3, 4, 5 | 0, 1, 2 | 0→L20, 1→L21, 2→L22, 3→L10, 4→L11, 5→L12 |
Process A writes the offline cache. Process B opens it, and when K1 is materialized:
create_cache_key(K1)→ same key as Process A →loadreturns Process A'sfrontend_cache_key→load_fast_cachefetches IR with_qd_cov[0] = 1,_qd_cov[1] = 1,_qd_cov[2] = 1baked in.only_parse_function_def=Trueat kernel.py:420 discards Process B's rewritten AST (which would have written to 3/4/5).- Launching K1 fires slots 0, 1, 2 in Process B's field.
- Harvest:
_probe_map[0..2]→ L20, L21, L22 (K2's lines) marked covered. K2 may never have executed in Process B — false positive._probe_map[3..5]→ L10, L11, L12 (K1's lines) are all zero — false negative even though K1 did run.
Why existing code does not prevent it
- The PR added
"kcov"tocreate_cache_key, demonstrating awareness of coverage-related cache partitioning, but only handles the on/off dimension. The probe_id-baked-in problem operates entirely within the kcov=on bucket. rewrite_astruns unconditionally in_func_base.py:254— there is no check for whether the subsequent fastcache lookup will hit and discard its output. So_probe_counteradvances and_probe_mapis populated per-process regardless of what IR actually runs.only_parse_function_def=Trueis by design: the PR's explicit support for@qd.kernel(fastcache=True)+ coverage (via the_qd_prefix exemption inast_transformer_utils.py:335) means cached pure kernels are expected to carry_qd_covwrites, but nothing keeps those writes consistent with the loading process's_probe_map.- CI does not catch this:
tests/python/test_offline_cache.pywas skipped underQD_KERNEL_COVERAGE=1in this same PR, andtest_src_ll_cache.pywas skipped for similar reasons — so no CI test exercises the kcov + offline_cache + cross-process combination.
What the impact would be
Silent correctness failure of coverage data when a user combines QD_KERNEL_COVERAGE=1, offline_cache=True, a persistent offline_cache_file_path, and @qd.kernel(fastcache=True) across multiple processes (pytest-xdist workers, sequential CI runs, or just two developers running the same test suite). Compilation order differences — which are common due to test parallelism, test discovery order, fixture setup, selective test runs — cause coverage reports with both false positives and false negatives. Nothing warns the user. The documented "automatic with pytest-cov" path explicitly opts into this combination.
Separately but relatedly, because QD_COVERAGE_MAX_PROBES is also not in the cache key, shrinking it between runs can cause a cached IR with baked-in _qd_cov[5000] = 1 to be launched against a freshly allocated (500,)-shaped field — a device-side out-of-bounds write (silent memory corruption on most backends). Narrower scenario, but the fix is the same one-line cache-key addition.
How to fix it
Two complementary changes in _kernel_coverage.py and src_hasher.py:
-
Make probe_ids deterministic across processes. Replace the monotonic counter with a stable hash:
_probe_id_cache: dict[tuple[str, int], int] = {} # (filepath, abs_lineno) -> probe_id _collision_offset = 0 def _probe_id_for(filepath: str, abs_lineno: int) -> int | None: key = (filepath, abs_lineno) with _lock: if key in _probe_id_cache: return _probe_id_cache[key] # derive from hash; resolve collisions within existing dict candidate = hash(key) % _MAX_PROBES while candidate in _probe_map and _probe_map[candidate] != key: candidate = (candidate + 1) % _MAX_PROBES if len(_probe_id_cache) >= _MAX_PROBES: return None # full _probe_id_cache[key] = candidate _probe_map[candidate] = key return candidate
This makes the IR byte-identical across processes for the same source, so the cached IR's
_qd_cov[N] = 1writes land in slots whose_probe_mapentries are the same in every process. -
Include
_MAX_PROBESin the fastcache key. Insrc_hasher.create_cache_key, appendstr(_MAX_PROBES)(oros.environ.get("QD_COVERAGE_MAX_PROBES", "100000")) to thehash_iterable_stringstuple. This prevents cache reuse across runs with different probe caps.
Additionally, test_offline_cache.py should have a non-skipped variant (or a dedicated test) that exercises the kcov=on + offline_cache path across two subprocesses with different compile orders, to catch future regressions.
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 90% |
| Overall project coverage | 72% |
Total: 451 lines, 47 missing, 90% covered
🔴 python/quadrants/lang/_fast_caching/src_hasher.py (0%)
🔴 2 import os
53 "kcov" if os.environ.get("QD_KERNEL_COVERAGE") == "1" else "",
🟢 python/quadrants/lang/_func_base.py (82%)
🔴 7 import os
25
🔴 26 def _kernel_coverage_enabled() -> bool:
🟢 27 return os.environ.get("QD_KERNEL_COVERAGE") == "1"
28
29
🟢 255 _kcov = None
🟢 256 if _kernel_coverage_enabled() and autodiff_mode == _qd_core.AutodiffMode.NONE:
🟢 257 from . import ( # pylint: disable=import-outside-toplevel
258 _kernel_coverage as _kcov,
259 )
260
🟢 261 tree = _kcov.rewrite_ast(tree, function_source_info.filepath, function_source_info.start_lineno)
262
🟢 266 if _kcov is not None:
🟢 267 cov_field = _kcov.get_field()
🟢 268 if cov_field is not None:
🟢 269 global_vars[_kcov.FIELD_VAR_NAME] = cov_field
🔴 python/quadrants/lang/_kernel_coverage.py (79%)
1 """Kernel code coverage via Python AST rewriting.
2
3 When enabled (QD_KERNEL_COVERAGE=1), this module rewrites kernel and func ASTs to insert coverage probes — field
4 stores that record which source lines actually execute on the GPU. At process exit, the collected data is written
5 to a .coverage file compatible with coverage.py / pytest-cov / diff-cover.
6
7 The probes are compiled as ordinary field stores by the existing pipeline, so no C++ changes are needed. When
8 disabled, this module is never imported and has zero impact on the normal runtime path.
9 """
10
🟢 11 import ast
🟢 12 import atexit
🟢 13 import logging
🟢 14 import os
🟢 15 import threading
🟢 16 import warnings
🟢 17 from typing import TYPE_CHECKING
18
🟢 19 from coverage import CoverageData # type: ignore[import-not-found]
20
🟢 21 import quadrants as qd
🟢 22 from quadrants.lang import impl
23
24 if TYPE_CHECKING:
25 from quadrants.lang.field import ScalarField
26
🟢 27 FIELD_VAR_NAME = "_qd_cov"
🟢 28 _MAX_PROBES = int(os.environ.get("QD_COVERAGE_MAX_PROBES", "100000"))
29
🟢 30 _lock = threading.Lock()
🟢 31 _cov_field: "ScalarField | None" = None
🟢 32 _cov_field_prog: object | None = None # tracks which Program instance owns _cov_field
🟢 33 _probe_counter: int = 0
34 # {probe_id: (filepath, absolute_lineno)}
🟢 35 _probe_map: dict[int, tuple[str, int]] = {}
36 # Accumulated coverage lines surviving across qd.init() resets
🟢 37 _accumulated_lines: dict[str, set[int]] = {}
🟢 38 _reset_hook_installed: bool = False
39 # Directory for .coverage and _qd_kcov.* files, captured when coverage is first enabled
🟢 40 _coverage_dir: str | None = None
41
42
🟢 43 def _harvest_field() -> None:
44 """Read probe data from the current field into _accumulated_lines.
45
46 Must be called while the runtime is still alive (before clear()).
47 """
48 global _cov_field, _cov_field_prog
🟢 49 with _lock:
🟢 50 if _cov_field is None or not _probe_map:
🟢 51 return
🟢 52 field_ref = _cov_field
🟢 53 probe_snapshot = dict(_probe_map)
🟢 54 _cov_field = None
🟢 55 _cov_field_prog = None
🟢 56 try:
🟢 57 arr = field_ref.to_numpy()
🟢 58 except Exception:
🟢 59 logging.warning("Failed to read coverage field, coverage data for this session will be lost", exc_info=True)
🟢 60 return
🟢 61 with _lock:
🟢 62 for probe_id, (filepath, lineno) in probe_snapshot.items():
🟢 63 if probe_id < len(arr) and arr[probe_id] != 0:
🟢 64 _accumulated_lines.setdefault(filepath, set()).add(lineno)
65
66
🟢 67 def _install_reset_hook() -> None:
68 """Monkey-patch PyQuadrants.clear() to harvest probes before destruction."""
69 global _reset_hook_installed
🟢 70 if _reset_hook_installed:
🟢 71 return
🟢 72 _original_clear = impl.PyQuadrants.clear
73
🟢 74 def _hooked_clear(self) -> None:
🟢 75 _harvest_field()
🟢 76 _original_clear(self)
77
🟢 78 impl.PyQuadrants.clear = _hooked_clear # type: ignore[assignment]
🟢 79 _reset_hook_installed = True
80
81
🟢 82 def ensure_field_allocated() -> None:
83 """Allocate (or re-allocate after qd.init()) the global coverage field."""
84 global _cov_field, _cov_field_prog, _coverage_dir
🟢 85 _install_reset_hook()
🟢 86 if _coverage_dir is None:
🟢 87 _coverage_dir = os.getcwd()
🟢 88 current_prog = impl.get_runtime()._prog
🟢 89 if _cov_field is not None and _cov_field_prog is current_prog:
🟢 90 return
🟢 91 with _lock:
🟢 92 current_prog = impl.get_runtime()._prog
🟢 93 if _cov_field is not None and _cov_field_prog is current_prog:
🔴 94 return
🟢 95 _cov_field = qd.field(dtype=qd.i32, shape=(_MAX_PROBES,)) # type: ignore[assignment]
🟢 96 _cov_field_prog = current_prog
97
98
🟢 99 def get_field() -> "ScalarField | None":
🟢 100 with _lock:
🟢 101 if _cov_field_prog is not impl.get_runtime()._prog:
🔴 102 return None
🟢 103 return _cov_field
104
105
🟢 106 def rewrite_ast(tree: ast.Module, filepath: str, start_lineno: int) -> ast.Module:
107 """Rewrite a kernel/func AST to insert coverage probes.
108
109 Each executable statement at a new source line gets a probe: ``_qd_cov[<probe_id>] = 1``.
110 Probes inside if/else bodies only fire when that branch is taken, giving true runtime branch coverage.
111 """
112 global _probe_counter
🟢 113 with _lock:
🟢 114 rewriter = _CoverageASTRewriter(
115 field_name=FIELD_VAR_NAME,
116 filepath=filepath,
117 start_lineno=start_lineno,
118 probe_id_start=_probe_counter,
119 )
🟢 120 tree = rewriter.visit(tree)
🟢 121 ast.fix_missing_locations(tree)
🟢 122 _probe_counter = rewriter.next_probe_id
🟢 123 _probe_map.update(rewriter.probe_map)
🟢 124 return tree
125
126
🟢 127 def _detect_arc_mode() -> bool:
128 """Detect whether pytest-cov is running in branch (arc) mode.
129
130 Checks _QD_KCOV_ARC env var first (set by the pytest plugin), then falls back to reading .coverage.
131 Defaults to False (line mode) when nothing is known, since ``pytest --cov`` without ``--cov-branch``
132 is the more common invocation.
133 """
🔴 134 arc_env = os.environ.get("_QD_KCOV_ARC")
🔴 135 if arc_env is not None:
🔴 136 return arc_env == "1"
🔴 137 try:
🔴 138 cov_path = os.path.join(_coverage_dir, ".coverage") if _coverage_dir else ".coverage"
🔴 139 cd = CoverageData(basename=cov_path)
🔴 140 cd.read()
🔴 141 if not cd.measured_files():
🔴 142 return False
🔴 143 return cd.has_arcs()
🔴 144 except Exception:
🔴 145 logging.debug("Failed to detect arc mode from .coverage file, defaulting to line mode", exc_info=True)
🔴 146 return False
147
148
🟢 149 def flush() -> None:
150 """Harvest any remaining field data and write all results to a .coverage file.
151
152 If .coverage.kernel already exists (e.g. from a prior test phase), the new data is merged into it so nothing
153 is lost across multiple invocations.
154 """
🔴 155 _harvest_field()
156
🔴 157 with _lock:
🔴 158 if not _accumulated_lines:
🔴 159 return
🔴 160 snapshot = {f: set(lines) for f, lines in _accumulated_lines.items()}
161
🔴 162 base_dir = _coverage_dir or os.getcwd()
🔴 163 kernel_path = os.path.join(base_dir, f"_qd_kcov.{os.getpid()}")
🔴 164 use_arcs = _detect_arc_mode()
165
🔴 166 cov = CoverageData(basename=kernel_path)
🔴 167 if use_arcs:
🔴 168 arcs_by_file: dict[str, list[tuple[int, int]]] = {}
🔴 169 for filepath, lines in snapshot.items():
170 # Emit only entry/exit arcs per line — we know which lines ran but not the actual transitions
171 # between them, so we avoid fabricating inter-line arcs that would misrepresent branch coverage.
🔴 172 arcs = []
🔴 173 for line in sorted(lines):
🔴 174 arcs.append((-1, line))
🔴 175 arcs.append((line, -1))
🔴 176 arcs_by_file[filepath] = arcs
🔴 177 cov.add_arcs(arcs_by_file)
178 else:
🔴 179 cov.add_lines({f: sorted(lines) for f, lines in snapshot.items()})
🔴 180 cov.write()
181
182
🟢 183 _capacity_warning_emitted = False
184
185
🟢 186 class _CoverageASTRewriter(ast.NodeTransformer):
187 """Insert coverage probes before each statement at a new source line."""
188
🟢 189 def __init__(self, field_name: str, filepath: str, start_lineno: int, probe_id_start: int) -> None:
🟢 190 self._field_name = field_name
🟢 191 self._filepath = filepath
🟢 192 self._start_lineno = start_lineno
🟢 193 self.next_probe_id = probe_id_start
🟢 194 self._seen_lines: set[int] = set()
🟢 195 self.probe_map: dict[int, tuple[str, int]] = {}
196
🟢 197 def _make_probe(self, abs_lineno: int, rel_lineno: int, col_offset: int) -> ast.Assign | None:
198 global _capacity_warning_emitted
🟢 199 probe_id = self.next_probe_id
🟢 200 if probe_id >= _MAX_PROBES:
🟢 201 if not _capacity_warning_emitted:
🟢 202 warnings.warn(
203 f"Kernel coverage probe capacity ({_MAX_PROBES}) exceeded. "
204 f"Additional kernel lines will not be tracked. "
205 f"Set QD_COVERAGE_MAX_PROBES to a higher value.",
206 stacklevel=2,
207 )
🟢 208 _capacity_warning_emitted = True
🟢 209 return None
🟢 210 self.probe_map[probe_id] = (self._filepath, abs_lineno)
🟢 211 self.next_probe_id += 1
🟢 212 node = ast.Assign(
213 targets=[
214 ast.Subscript(
215 value=ast.Name(id=self._field_name, ctx=ast.Load()),
216 slice=ast.Constant(value=probe_id),
217 ctx=ast.Store(),
218 )
219 ],
220 value=ast.Constant(value=1),
221 lineno=rel_lineno,
222 col_offset=col_offset,
223 end_lineno=rel_lineno,
224 end_col_offset=col_offset,
225 )
🟢 226 return node
227
🟢 228 def _instrument_body(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
🟢 229 result: list[ast.stmt] = []
🟢 230 for stmt in stmts:
🟢 231 rel_lineno = getattr(stmt, "lineno", None)
🟢 232 if rel_lineno is not None:
🟢 233 abs_lineno = rel_lineno + self._start_lineno - 1
🟢 234 if abs_lineno not in self._seen_lines:
🟢 235 self._seen_lines.add(abs_lineno)
🟢 236 col = getattr(stmt, "col_offset", 0)
🟢 237 probe = self._make_probe(abs_lineno, rel_lineno, col)
🟢 238 if probe is not None:
🟢 239 result.append(probe)
🟢 240 result.append(self.visit(stmt))
🟢 241 return result
242
🟢 243 def visit_FunctionDef(self, node: ast.FunctionDef) -> ast.FunctionDef:
🟢 244 node.body = self._instrument_body(node.body)
🟢 245 return node
246
🟢 247 def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> ast.AsyncFunctionDef:
🔴 248 node.body = self._instrument_body(node.body)
🔴 249 return node
250
🟢 251 def visit_If(self, node: ast.If) -> ast.If:
🟢 252 node.body = self._instrument_body(node.body)
🟢 253 if node.orelse:
🟢 254 node.orelse = self._instrument_body(node.orelse)
🟢 255 return node
256
🟢 257 def visit_For(self, node: ast.For) -> ast.For:
🟢 258 node.body = self._instrument_body(node.body)
🟢 259 if node.orelse:
🟢 260 node.orelse = self._instrument_body(node.orelse)
🟢 261 return node
262
🟢 263 def visit_While(self, node: ast.While) -> ast.While:
🟢 264 node.body = self._instrument_body(node.body)
🟢 265 if node.orelse:
🟢 266 node.orelse = self._instrument_body(node.orelse)
🟢 267 return node
268
🟢 269 def visit_With(self, node: ast.With) -> ast.With:
🟢 270 node.body = self._instrument_body(node.body)
🟢 271 return node
272
🟢 273 def visit_Try(self, node: ast.Try) -> ast.Try:
🟢 274 node.body = self._instrument_body(node.body)
🟢 275 for handler in node.handlers:
🟢 276 handler.body = self._instrument_body(handler.body)
🟢 277 if node.orelse:
🟢 278 node.orelse = self._instrument_body(node.orelse)
🟢 279 if node.finalbody:
🟢 280 node.finalbody = self._instrument_body(node.finalbody)
🟢 281 return node
282
283
🟢 284 atexit.register(flush)
🟢 python/quadrants/lang/ast/ast_transformer_utils.py (100%)
🟢 335 if not name.startswith("_qd_"):
🟢 336 reason = f"{name} is in global vars, therefore violates pure"
🟢 337 violates_pure = True
🟢 python/quadrants/lang/kernel.py (80%)
19
🔴 20 def _kernel_coverage_enabled() -> bool:
🟢 21 return os.environ.get("QD_KERNEL_COVERAGE") == "1"
22
23
🟢 382 if _kernel_coverage_enabled():
🟢 383 from . import _kernel_coverage # pylint: disable=import-outside-toplevel
384
🟢 385 _kernel_coverage.ensure_field_allocated()
386
🟢 python/quadrants/lang/misc.py (100%)
496
🟢 497 if os.environ.get("QD_KERNEL_COVERAGE") == "1":
🟢 498 from . import _kernel_coverage # pylint: disable=import-outside-toplevel
499
🟢 500 _kernel_coverage.ensure_field_allocated()
501
🔴 python/quadrants/pytest_plugin.py (67%)
1 """Pytest plugin that auto-enables kernel coverage when pytest-cov is active.
2
3 Registered via the ``pytest11`` entry point so it loads automatically when quadrants is installed.
4 Opt out by setting ``QD_KERNEL_COVERAGE=0`` explicitly.
5 """
6
🔴 7 import os
8
9
🔴 10 def pytest_configure(config):
🟢 11 if not config.pluginmanager.hasplugin("_cov"):
🔴 12 return
🟢 13 os.environ.setdefault("QD_KERNEL_COVERAGE", "1")
🟢 14 if os.environ.get("QD_KERNEL_COVERAGE") != "1":
🟢 15 return
16 # Tell the kernel coverage module whether pytest-cov is running in branch (arc) mode,
17 # so it writes the matching format and avoids "Can not mix line and arc data" at combine time.
18 # We read config.option.cov_branch which pytest-cov has already populated by this point.
🟢 19 cov_branch = getattr(config.option, "cov_branch", False) or False
🟢 20 os.environ["_QD_KCOV_ARC"] = "1" if cov_branch else "0"
🟢 tests/python/quadrants/lang/fast_caching/test_src_ll_cache.py (100%)
11
🟢 12 _KERNEL_COVERAGE = os.environ.get("QD_KERNEL_COVERAGE") == "1"
🟢 67 @pytest.mark.skipif(
68 _KERNEL_COVERAGE,
69 reason="Coverage probes change LLVM IR addresses after reinit, breaking recompile comparison",
70 )
🟢 tests/python/quadrants/lang/test_kernel_impl.py (100%)
🟢 1 import os
🟢 11 _KERNEL_COVERAGE = os.environ.get("QD_KERNEL_COVERAGE") == "1"
13
🟢 14 @pytest.mark.skipif(
15 _KERNEL_COVERAGE,
16 reason="Coverage probes change the kernel AST, preventing FE-LL cache hits after reinit",
17 )
🟢 tests/python/test_api.py (100%)
🟢 438 actual = sorted([s for s in dir(src) if not s.startswith(("_", "@")) and s != "pytest_plugin"])
🟢 tests/python/test_intrinsics.py (100%)
55 (thread i does (i+1)*200000). Asserts strict monotonicity across threads and that
🟢 91 assert a[i - 1] < a[i] < a[i + 1]
🟢 tests/python/test_kernel_coverage.py (99%)
1 """Tests for kernel code coverage instrumentation.
2
3 These tests verify that the AST rewriter correctly inserts coverage probes and that the probes fire when kernel
4 code executes on the device.
5 """
6
🟢 7 import ast
🟢 8 import os
🟢 9 import textwrap
10
🟢 11 import pytest
12
🟢 13 import quadrants as qd
14
🟢 15 from tests import test_utils
16
17 # These tests only run when QD_KERNEL_COVERAGE=1
🟢 18 pytestmark = pytest.mark.skipif(
19 os.environ.get("QD_KERNEL_COVERAGE", "") != "1",
20 reason="QD_KERNEL_COVERAGE=1 not set",
21 )
22
23
24 # ---------------------------------------------------------------------------
25 # AST rewriter unit tests
26 # ---------------------------------------------------------------------------
27
🟢 28 _AST_REWRITER_CASES = [
29 pytest.param(
30 """\
31 def f():
32 x = 1
33 y = 2
34 return x + y
35 """,
36 {11, 12, 13},
37 10,
38 id="straight_line",
39 ),
40 pytest.param(
41 """\
42 def f():
43 if x > 0:
44 a = 1
45 else:
46 b = 2
47 """,
48 {2, 3, 5},
49 1,
50 id="if_else",
51 ),
52 pytest.param(
53 """\
54 def f():
55 for i in range(10):
56 x = i
57 """,
58 {2, 3},
59 1,
60 id="for_loop",
61 ),
62 pytest.param(
63 """\
64 def f():
65 while x > 0:
66 x = x - 1
67 else:
68 y = 0
69 """,
70 {2, 3, 5},
71 1,
72 id="while_loop_else",
73 ),
74 pytest.param(
75 """\
76 def f():
77 with ctx:
78 a = 1
79 b = 2
80 """,
81 {2, 3, 4},
82 1,
83 id="with_statement",
84 ),
85 pytest.param(
86 """\
87 def f():
88 try:
89 a = 1
90 except:
91 b = 2
92 else:
93 c = 3
94 finally:
95 d = 4
96 """,
97 {3, 5, 7, 9},
98 1,
99 id="try_except_finally",
100 ),
101 ]
102
103
🟢 104 @pytest.mark.parametrize("src,expected_lines,start_lineno", _AST_REWRITER_CASES)
🟢 105 def test_ast_rewriter(src, expected_lines, start_lineno):
106 """Verify the AST rewriter inserts probes at the expected source lines."""
🟢 107 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
108
🟢 109 tree = ast.parse(textwrap.dedent(src))
🟢 110 rewriter = _CoverageASTRewriter(
111 field_name="_qd_cov", filepath="test.py", start_lineno=start_lineno, probe_id_start=0
112 )
🟢 113 rewriter.visit(tree)
114
🟢 115 covered_lines = {lineno for _, (_, lineno) in rewriter.probe_map.items()}
🟢 116 assert expected_lines.issubset(covered_lines), f"Expected lines {expected_lines} to be probed, got {covered_lines}"
117
118
🟢 119 def test_ast_rewriter_capacity_limit():
120 """Verify that probes stop being inserted when the capacity limit is hit."""
🟢 121 import warnings
122
🟢 123 import quadrants.lang._kernel_coverage as kcov
🟢 124 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
125
🟢 126 src = textwrap.dedent(
127 """\
128 def f():
129 a = 1
130 b = 2
131 c = 3
132 """
133 )
🟢 134 tree = ast.parse(src)
🟢 135 old_warning_state = kcov._capacity_warning_emitted
🟢 136 kcov._capacity_warning_emitted = False
🟢 137 try:
🟢 138 with warnings.catch_warnings(record=True) as w:
🟢 139 warnings.simplefilter("always")
🟢 140 rewriter = _CoverageASTRewriter(
141 field_name="_qd_cov", filepath="test.py", start_lineno=1, probe_id_start=kcov._MAX_PROBES - 1
142 )
🟢 143 rewriter.visit(tree)
144
🟢 145 assert rewriter.next_probe_id == kcov._MAX_PROBES
🟢 146 assert len(rewriter.probe_map) == 1, f"Only 1 probe should fit, got {len(rewriter.probe_map)}"
🟢 147 assert len(w) == 1
🟢 148 assert "exceeded" in str(w[0].message).lower()
149 finally:
🟢 150 kcov._capacity_warning_emitted = old_warning_state
151
152
🟢 153 def test_ast_rewriter_deduplicates_same_line():
154 """Verify that two statements on the same source line get only one probe."""
🟢 155 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
156
🟢 157 src = "def f():\n a = 1; b = 2\n"
🟢 158 tree = ast.parse(src)
🟢 159 rewriter = _CoverageASTRewriter(field_name="_qd_cov", filepath="test.py", start_lineno=1, probe_id_start=0)
🟢 160 rewriter.visit(tree)
161
🟢 162 abs_lines = [lineno for _, (_, lineno) in rewriter.probe_map.items()]
🟢 163 assert abs_lines.count(2) == 1, f"Line 2 should have exactly one probe, got {abs_lines.count(2)}"
164
165
🟢 166 def test_env_var_max_probes():
167 """Verify that QD_COVERAGE_MAX_PROBES env var is read at import time."""
🟢 168 import quadrants.lang._kernel_coverage as kcov
169
🟢 170 assert kcov._MAX_PROBES == int(os.environ.get("QD_COVERAGE_MAX_PROBES", "100000"))
171
172
🟢 173 def test_harvest_field_exception_path():
174 """Verify that _harvest_field handles to_numpy() failure gracefully."""
🟢 175 from unittest.mock import MagicMock
176
🟢 177 import quadrants.lang._kernel_coverage as kcov
178
🟢 179 old_field = kcov._cov_field
🟢 180 old_prog = kcov._cov_field_prog
🟢 181 old_map = kcov._probe_map.copy()
🟢 182 try:
🟢 183 mock_field = MagicMock()
🟢 184 mock_field.to_numpy.side_effect = RuntimeError("runtime destroyed")
🟢 185 kcov._cov_field = mock_field
🟢 186 kcov._cov_field_prog = object()
🟢 187 kcov._probe_map[999999] = ("fake.py", 1)
188
189 # Should not raise — the exception is caught and logged
🟢 190 kcov._harvest_field()
191
🟢 192 assert kcov._cov_field is None, "Field should be cleared after failure"
🟢 193 assert kcov._cov_field_prog is None, "Field prog should be cleared after failure"
194 finally:
🟢 195 kcov._cov_field = old_field
🟢 196 kcov._cov_field_prog = old_prog
🟢 197 kcov._probe_map = old_map
198
199
200 # ---------------------------------------------------------------------------
201 # End-to-end tests
202 # ---------------------------------------------------------------------------
203
204
🟢 205 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 206 def test_kernel_coverage_branches_e2e():
207 """Verify that only the taken branch has its probe fired."""
🟢 208 from quadrants.lang import _kernel_coverage
209
🟢 210 _kernel_coverage.ensure_field_allocated()
211
🟢 212 probe_count_before = _kernel_coverage._probe_counter
🟢 213 out = qd.field(dtype=qd.i32, shape=(1,))
214
🟢 215 @qd.kernel
🟢 216 def branching_kernel():
🟢 217 x = 10
🟢 218 if x > 5:
🟢 219 out[0] = 1
220 else:
🔴 221 out[0] = 2
222
🟢 223 branching_kernel()
224
🟢 225 assert out[0] == 1
226
🟢 227 cov_field = _kernel_coverage.get_field()
🟢 228 arr = cov_field.to_numpy()
229
🟢 230 probes_for_kernel = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
231
🟢 232 taken_probes = {pid for pid, loc in probes_for_kernel.items() if arr[pid] != 0}
🟢 233 not_taken_probes = {pid for pid, loc in probes_for_kernel.items() if arr[pid] == 0}
234
🟢 235 assert len(taken_probes) > 0, "At least some probes should have fired"
🟢 236 assert len(not_taken_probes) > 0, "The else branch should not have been reached"
237
238
🟢 239 @test_utils.test(arch=qd.gpu)
🟢 240 def test_kernel_coverage_simt_e2e():
241 """Verify coverage probes track branches with block.sync() and subgroup shuffle.
242
243 The if/else is based on a runtime value read from a field, so the compiler cannot constant-fold it away.
244 Only the taken branch's shuffle probe should fire.
245 """
🟢 246 from quadrants.lang import _kernel_coverage
🟢 247 from quadrants.lang.simt import subgroup
248
🟢 249 _kernel_coverage.ensure_field_allocated()
250
🟢 251 N = 64
🟢 252 probe_count_before = _kernel_coverage._probe_counter
🟢 253 flag = qd.field(dtype=qd.i32, shape=(1,))
🟢 254 a = qd.field(dtype=qd.i32, shape=(N,))
🟢 255 out = qd.field(dtype=qd.i32, shape=(N,))
256
🟢 257 flag[0] = 1 # runtime value: take the if-branch
258
🟢 259 @qd.kernel
🟢 260 def simt_kernel():
🟢 261 qd.loop_config(block_dim=N)
🟢 262 for i in range(N):
🟢 263 a[i] = i + 1
🟢 264 qd.simt.block.sync()
🟢 265 if flag[0] > 0:
🟢 266 val = subgroup.shuffle(a[i], qd.u32(0))
🟢 267 out[i] = val
268 else:
🔴 269 val = subgroup.shuffle(a[i], qd.u32(1))
🔴 270 out[i] = val + 100
271
🟢 272 simt_kernel()
273
🟢 274 for i in range(4):
🟢 275 assert out[i] == 1, f"Expected 1 at index {i}, got {out[i]}"
276
🟢 277 cov_field = _kernel_coverage.get_field()
🟢 278 arr = cov_field.to_numpy()
279
🟢 280 probes_for_kernel = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
281
🟢 282 fired = {pid for pid in probes_for_kernel if arr[pid] != 0}
🟢 283 not_fired = {pid for pid in probes_for_kernel if arr[pid] == 0}
🟢 284 assert len(fired) >= 4, f"Expected at least 4 probes to fire, got {len(fired)}"
🟢 285 assert len(not_fired) >= 2, "The else branch should not have been reached"
286
287
🟢 288 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 289 def test_kernel_coverage_survives_reinit():
290 """Verify that coverage data accumulated before qd.init() reset is preserved.
291
292 Runs a kernel, then resets via qd.reset()/qd.init() (which triggers the _hooked_clear harvest), runs another
293 kernel, harvests again, and checks that _accumulated_lines contains data from both sessions.
294 """
🟢 295 from quadrants.lang import _kernel_coverage, impl
296
🟢 297 current_arch = impl.get_runtime()._arch
🟢 298 _kernel_coverage.ensure_field_allocated()
299
🟢 300 probe_count_before = _kernel_coverage._probe_counter
🟢 301 out1 = qd.field(dtype=qd.i32, shape=(1,))
302
🟢 303 @qd.kernel
🟢 304 def kernel_before_reset():
🟢 305 out1[0] = 1
306
🟢 307 kernel_before_reset()
308
🟢 309 cov_field = _kernel_coverage.get_field()
🟢 310 assert cov_field is not None
🟢 311 arr = cov_field.to_numpy()
🟢 312 probes_first = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 313 fired_first = {pid for pid in probes_first if arr[pid] != 0}
🟢 314 assert len(fired_first) > 0, "Probes from first kernel should have fired"
315
316 # Don't call _harvest_field() manually — let qd.reset() trigger it via the _hooked_clear hook
🟢 317 qd.reset()
318
319 # Verify the hook harvested data from the first session
🟢 320 files_before = set(_kernel_coverage._accumulated_lines.keys())
🟢 321 assert len(files_before) > 0, "Hook should have harvested data during reset"
🟢 322 lines_before = {}
🟢 323 for f, lines in _kernel_coverage._accumulated_lines.items():
🟢 324 lines_before[f] = set(lines)
325
🟢 326 qd.init(arch=current_arch)
327
🟢 328 _kernel_coverage.ensure_field_allocated()
329
🟢 330 probe_count_mid = _kernel_coverage._probe_counter
🟢 331 out2 = qd.field(dtype=qd.i32, shape=(1,))
332
🟢 333 @qd.kernel
🟢 334 def kernel_after_reset():
🟢 335 out2[0] = 2
336
🟢 337 kernel_after_reset()
338
🟢 339 _kernel_coverage._harvest_field()
340
🟢 341 for f in files_before:
🟢 342 assert (
343 f in _kernel_coverage._accumulated_lines
344 ), f"File {f} from before reset should still be in _accumulated_lines"
🟢 345 assert lines_before[f].issubset(
346 _kernel_coverage._accumulated_lines[f]
347 ), "Lines from before reset should be preserved"
348
🟢 349 probes_second = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_mid}
🟢 350 second_files = {loc[0] for loc in probes_second.values()}
🟢 351 for f in second_files:
🟢 352 assert f in _kernel_coverage._accumulated_lines, f"File {f} from second kernel should be in _accumulated_lines"
353
354
🟢 355 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 356 def test_kernel_coverage_autodiff():
357 """Verify that autodiff forward pass produces probes but backward does not.
358
359 The forward compilation (AutodiffMode.NONE) should insert probes that fire. The backward compilation
360 (AutodiffMode.REVERSE) should not add any probes.
361 """
🟢 362 from quadrants.lang import _kernel_coverage
363
🟢 364 _kernel_coverage.ensure_field_allocated()
365
🟢 366 x = qd.field(dtype=qd.f32, shape=(), needs_grad=True)
🟢 367 loss = qd.field(dtype=qd.f32, shape=(), needs_grad=True)
368
🟢 369 @qd.kernel
🟢 370 def compute():
🟢 371 loss[None] = x[None] * x[None]
372
🟢 373 x[None] = 5.0
374
🟢 375 probe_count_before = _kernel_coverage._probe_counter
376
🟢 377 with qd.ad.Tape(loss):
🟢 378 compute()
379
🟢 380 probe_count_after_tape = _kernel_coverage._probe_counter
🟢 381 forward_probes = probe_count_after_tape - probe_count_before
🟢 382 assert forward_probes > 0, "Forward compilation should have inserted probes"
383
384 # Verify forward probes actually fired
🟢 385 cov_field = _kernel_coverage.get_field()
🟢 386 assert cov_field is not None
🟢 387 arr = cov_field.to_numpy()
🟢 388 probes = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 389 fired = {pid for pid in probes if arr[pid] != 0}
🟢 390 assert len(fired) > 0, "Forward pass inside Tape should produce fired coverage probes"
391
392 # Verify backward pass computes correct gradients
🟢 393 assert loss[None] == pytest.approx(25.0)
🟢 394 assert x.grad[None] == pytest.approx(10.0)
395
396
🟢 397 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 398 def test_kernel_coverage_qd_func():
399 """Verify that probes fire inside a @qd.func called from a kernel."""
🟢 400 from quadrants.lang import _kernel_coverage
401
🟢 402 _kernel_coverage.ensure_field_allocated()
403
🟢 404 probe_count_before = _kernel_coverage._probe_counter
🟢 405 out = qd.field(dtype=qd.i32, shape=(1,))
406
🟢 407 @qd.func
🟢 408 def helper():
🟢 409 out[0] = 99
410
🟢 411 @qd.kernel
🟢 412 def caller():
🟢 413 helper()
414
🟢 415 caller()
416
🟢 417 assert out[0] == 99
418
🟢 419 cov_field = _kernel_coverage.get_field()
🟢 420 assert cov_field is not None
🟢 421 arr = cov_field.to_numpy()
422
🟢 423 probes = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 424 fired = {pid for pid in probes if arr[pid] != 0}
425 # The kernel body has one statement (helper()), and the func body has one (out[0] = 99).
426 # Both should produce probes that fire.
🟢 427 assert (
428 len(fired) >= 2
429 ), f"Expected probes from both kernel and func to fire, got {len(fired)} fired out of {len(probes)}"
430
431
🟢 432 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 433 def test_kernel_coverage_multiple_kernels_same_session():
434 """Verify that probes from two different kernels both fire in the same session."""
🟢 435 from quadrants.lang import _kernel_coverage
436
🟢 437 _kernel_coverage.ensure_field_allocated()
438
🟢 439 probe_count_before = _kernel_coverage._probe_counter
🟢 440 a = qd.field(dtype=qd.i32, shape=(1,))
🟢 441 b = qd.field(dtype=qd.i32, shape=(1,))
442
🟢 443 @qd.kernel
🟢 444 def kernel_a():
🟢 445 a[0] = 10
446
🟢 447 @qd.kernel
🟢 448 def kernel_b():
🟢 449 b[0] = 20
450
🟢 451 kernel_a()
🟢 452 probe_count_after_a = _kernel_coverage._probe_counter
🟢 453 kernel_b()
454
🟢 455 assert a[0] == 10
🟢 456 assert b[0] == 20
457
🟢 458 cov_field = _kernel_coverage.get_field()
🟢 459 arr = cov_field.to_numpy()
460
🟢 461 probes_a = {
462 pid: loc for pid, loc in _kernel_coverage._probe_map.items() if probe_count_before <= pid < probe_count_after_a
463 }
🟢 464 probes_b = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_after_a}
465
🟢 466 fired_a = {pid for pid in probes_a if arr[pid] != 0}
🟢 467 fired_b = {pid for pid in probes_b if arr[pid] != 0}
468
🟢 469 assert len(fired_a) > 0, "Probes from kernel_a should have fired"
🟢 470 assert len(fired_b) > 0, "Probes from kernel_b should have fired"
471
472
🟢 473 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 474 def test_qd_prefix_exemption_pure_kernel():
475 """Verify that _qd_-prefixed globals don't violate pure kernel checks.
476
477 With kernel coverage enabled, _qd_cov is injected as a global. This test verifies that a pure (fastcache)
478 kernel still compiles without error. The kernel uses ndarray arguments (not global fields) because pure
479 kernels prohibit non-_qd_ globals.
480 """
🟢 481 a = qd.ndarray(qd.i32, (1,))
482
🟢 483 @qd.kernel(fastcache=True)
🟢 484 def pure_kernel(arr: qd.types.NDArray) -> None:
🟢 485 arr[0] = 42
486
🟢 487 pure_kernel(a)
🟢 488 assert a[0] == 42
🟢 tests/python/test_offline_cache.py (100%)
16 # Coverage field allocation creates internal fill kernels that change cache file counts.
17 # CI runs these tests in a separate phase without QD_KERNEL_COVERAGE (see 4_test.sh).
🟢 18 pytestmark = pytest.mark.skipif(
19 os.environ.get("QD_KERNEL_COVERAGE") == "1",
20 reason="Kernel coverage adds internal kernels that invalidate cache file count assertions",
21 )
22
🟢 tests/python/test_snode_layout_inspection.py (100%)
🟢 1 import os
2
🟢 3 import pytest
4
🟢 10 @pytest.mark.skipif(
11 os.environ.get("QD_KERNEL_COVERAGE") == "1",
12 reason="Kernel coverage field on root shifts offset assertions",
13 )
|
Update from Opus: One new comment (59 → 60). It's a repeat of earlier comment #38 -- three e2e tests dereference get_field() without checking for None first, while the other three tests in the This is a minor code quality issue (yellow severity). The tests would give an opaque AttributeError instead of a clear AssertionError if get_field() returned None. Want me to so basically, no need to do anything, and good to merge now?Yes, that's my read. The remaining comment is a minor style consistency issue (adding assert cov_field is not None to 3 tests that already work fine). Not a bug, not a Good to merge. |
…4_test.sh) Made-with: Cursor
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 90% |
| Overall project coverage | 72% |
Total: 451 lines, 47 missing, 90% covered
🔴 python/quadrants/lang/_fast_caching/src_hasher.py (0%)
🔴 2 import os
53 "kcov" if os.environ.get("QD_KERNEL_COVERAGE") == "1" else "",
🟢 python/quadrants/lang/_func_base.py (82%)
🔴 7 import os
25
🔴 26 def _kernel_coverage_enabled() -> bool:
🟢 27 return os.environ.get("QD_KERNEL_COVERAGE") == "1"
28
29
🟢 255 _kcov = None
🟢 256 if _kernel_coverage_enabled() and autodiff_mode == _qd_core.AutodiffMode.NONE:
🟢 257 from . import ( # pylint: disable=import-outside-toplevel
258 _kernel_coverage as _kcov,
259 )
260
🟢 261 tree = _kcov.rewrite_ast(tree, function_source_info.filepath, function_source_info.start_lineno)
262
🟢 266 if _kcov is not None:
🟢 267 cov_field = _kcov.get_field()
🟢 268 if cov_field is not None:
🟢 269 global_vars[_kcov.FIELD_VAR_NAME] = cov_field
🔴 python/quadrants/lang/_kernel_coverage.py (79%)
1 """Kernel code coverage via Python AST rewriting.
2
3 When enabled (QD_KERNEL_COVERAGE=1), this module rewrites kernel and func ASTs to insert coverage probes — field
4 stores that record which source lines actually execute on the GPU. At process exit, the collected data is written
5 to a .coverage file compatible with coverage.py / pytest-cov / diff-cover.
6
7 The probes are compiled as ordinary field stores by the existing pipeline, so no C++ changes are needed. When
8 disabled, this module is never imported and has zero impact on the normal runtime path.
9 """
10
🟢 11 import ast
🟢 12 import atexit
🟢 13 import logging
🟢 14 import os
🟢 15 import threading
🟢 16 import warnings
🟢 17 from typing import TYPE_CHECKING
18
🟢 19 from coverage import CoverageData # type: ignore[import-not-found]
20
🟢 21 import quadrants as qd
🟢 22 from quadrants.lang import impl
23
24 if TYPE_CHECKING:
25 from quadrants.lang.field import ScalarField
26
🟢 27 FIELD_VAR_NAME = "_qd_cov"
🟢 28 _MAX_PROBES = int(os.environ.get("QD_COVERAGE_MAX_PROBES", "100000"))
29
🟢 30 _lock = threading.Lock()
🟢 31 _cov_field: "ScalarField | None" = None
🟢 32 _cov_field_prog: object | None = None # tracks which Program instance owns _cov_field
🟢 33 _probe_counter: int = 0
34 # {probe_id: (filepath, absolute_lineno)}
🟢 35 _probe_map: dict[int, tuple[str, int]] = {}
36 # Accumulated coverage lines surviving across qd.init() resets
🟢 37 _accumulated_lines: dict[str, set[int]] = {}
🟢 38 _reset_hook_installed: bool = False
39 # Directory for .coverage and _qd_kcov.* files, captured when coverage is first enabled
🟢 40 _coverage_dir: str | None = None
41
42
🟢 43 def _harvest_field() -> None:
44 """Read probe data from the current field into _accumulated_lines.
45
46 Must be called while the runtime is still alive (before clear()).
47 """
48 global _cov_field, _cov_field_prog
🟢 49 with _lock:
🟢 50 if _cov_field is None or not _probe_map:
🟢 51 return
🟢 52 field_ref = _cov_field
🟢 53 probe_snapshot = dict(_probe_map)
🟢 54 _cov_field = None
🟢 55 _cov_field_prog = None
🟢 56 try:
🟢 57 arr = field_ref.to_numpy()
🟢 58 except Exception:
🟢 59 logging.warning("Failed to read coverage field, coverage data for this session will be lost", exc_info=True)
🟢 60 return
🟢 61 with _lock:
🟢 62 for probe_id, (filepath, lineno) in probe_snapshot.items():
🟢 63 if probe_id < len(arr) and arr[probe_id] != 0:
🟢 64 _accumulated_lines.setdefault(filepath, set()).add(lineno)
65
66
🟢 67 def _install_reset_hook() -> None:
68 """Monkey-patch PyQuadrants.clear() to harvest probes before destruction."""
69 global _reset_hook_installed
🟢 70 if _reset_hook_installed:
🟢 71 return
🟢 72 _original_clear = impl.PyQuadrants.clear
73
🟢 74 def _hooked_clear(self) -> None:
🟢 75 _harvest_field()
🟢 76 _original_clear(self)
77
🟢 78 impl.PyQuadrants.clear = _hooked_clear # type: ignore[assignment]
🟢 79 _reset_hook_installed = True
80
81
🟢 82 def ensure_field_allocated() -> None:
83 """Allocate (or re-allocate after qd.init()) the global coverage field."""
84 global _cov_field, _cov_field_prog, _coverage_dir
🟢 85 _install_reset_hook()
🟢 86 if _coverage_dir is None:
🟢 87 _coverage_dir = os.getcwd()
🟢 88 current_prog = impl.get_runtime()._prog
🟢 89 if _cov_field is not None and _cov_field_prog is current_prog:
🟢 90 return
🟢 91 with _lock:
🟢 92 current_prog = impl.get_runtime()._prog
🟢 93 if _cov_field is not None and _cov_field_prog is current_prog:
🔴 94 return
🟢 95 _cov_field = qd.field(dtype=qd.i32, shape=(_MAX_PROBES,)) # type: ignore[assignment]
🟢 96 _cov_field_prog = current_prog
97
98
🟢 99 def get_field() -> "ScalarField | None":
🟢 100 with _lock:
🟢 101 if _cov_field_prog is not impl.get_runtime()._prog:
🔴 102 return None
🟢 103 return _cov_field
104
105
🟢 106 def rewrite_ast(tree: ast.Module, filepath: str, start_lineno: int) -> ast.Module:
107 """Rewrite a kernel/func AST to insert coverage probes.
108
109 Each executable statement at a new source line gets a probe: ``_qd_cov[<probe_id>] = 1``.
110 Probes inside if/else bodies only fire when that branch is taken, giving true runtime branch coverage.
111 """
112 global _probe_counter
🟢 113 with _lock:
🟢 114 rewriter = _CoverageASTRewriter(
115 field_name=FIELD_VAR_NAME,
116 filepath=filepath,
117 start_lineno=start_lineno,
118 probe_id_start=_probe_counter,
119 )
🟢 120 tree = rewriter.visit(tree)
🟢 121 ast.fix_missing_locations(tree)
🟢 122 _probe_counter = rewriter.next_probe_id
🟢 123 _probe_map.update(rewriter.probe_map)
🟢 124 return tree
125
126
🟢 127 def _detect_arc_mode() -> bool:
128 """Detect whether pytest-cov is running in branch (arc) mode.
129
130 Checks _QD_KCOV_ARC env var first (set by the pytest plugin), then falls back to reading .coverage.
131 Defaults to False (line mode) when nothing is known, since ``pytest --cov`` without ``--cov-branch``
132 is the more common invocation.
133 """
🔴 134 arc_env = os.environ.get("_QD_KCOV_ARC")
🔴 135 if arc_env is not None:
🔴 136 return arc_env == "1"
🔴 137 try:
🔴 138 cov_path = os.path.join(_coverage_dir, ".coverage") if _coverage_dir else ".coverage"
🔴 139 cd = CoverageData(basename=cov_path)
🔴 140 cd.read()
🔴 141 if not cd.measured_files():
🔴 142 return False
🔴 143 return cd.has_arcs()
🔴 144 except Exception:
🔴 145 logging.debug("Failed to detect arc mode from .coverage file, defaulting to line mode", exc_info=True)
🔴 146 return False
147
148
🟢 149 def flush() -> None:
150 """Harvest any remaining field data and write all results to a .coverage file.
151
152 If .coverage.kernel already exists (e.g. from a prior test phase), the new data is merged into it so nothing
153 is lost across multiple invocations.
154 """
🔴 155 _harvest_field()
156
🔴 157 with _lock:
🔴 158 if not _accumulated_lines:
🔴 159 return
🔴 160 snapshot = {f: set(lines) for f, lines in _accumulated_lines.items()}
161
🔴 162 base_dir = _coverage_dir or os.getcwd()
🔴 163 kernel_path = os.path.join(base_dir, f"_qd_kcov.{os.getpid()}")
🔴 164 use_arcs = _detect_arc_mode()
165
🔴 166 cov = CoverageData(basename=kernel_path)
🔴 167 if use_arcs:
🔴 168 arcs_by_file: dict[str, list[tuple[int, int]]] = {}
🔴 169 for filepath, lines in snapshot.items():
170 # Emit only entry/exit arcs per line — we know which lines ran but not the actual transitions
171 # between them, so we avoid fabricating inter-line arcs that would misrepresent branch coverage.
🔴 172 arcs = []
🔴 173 for line in sorted(lines):
🔴 174 arcs.append((-1, line))
🔴 175 arcs.append((line, -1))
🔴 176 arcs_by_file[filepath] = arcs
🔴 177 cov.add_arcs(arcs_by_file)
178 else:
🔴 179 cov.add_lines({f: sorted(lines) for f, lines in snapshot.items()})
🔴 180 cov.write()
181
182
🟢 183 _capacity_warning_emitted = False
184
185
🟢 186 class _CoverageASTRewriter(ast.NodeTransformer):
187 """Insert coverage probes before each statement at a new source line."""
188
🟢 189 def __init__(self, field_name: str, filepath: str, start_lineno: int, probe_id_start: int) -> None:
🟢 190 self._field_name = field_name
🟢 191 self._filepath = filepath
🟢 192 self._start_lineno = start_lineno
🟢 193 self.next_probe_id = probe_id_start
🟢 194 self._seen_lines: set[int] = set()
🟢 195 self.probe_map: dict[int, tuple[str, int]] = {}
196
🟢 197 def _make_probe(self, abs_lineno: int, rel_lineno: int, col_offset: int) -> ast.Assign | None:
198 global _capacity_warning_emitted
🟢 199 probe_id = self.next_probe_id
🟢 200 if probe_id >= _MAX_PROBES:
🟢 201 if not _capacity_warning_emitted:
🟢 202 warnings.warn(
203 f"Kernel coverage probe capacity ({_MAX_PROBES}) exceeded. "
204 f"Additional kernel lines will not be tracked. "
205 f"Set QD_COVERAGE_MAX_PROBES to a higher value.",
206 stacklevel=2,
207 )
🟢 208 _capacity_warning_emitted = True
🟢 209 return None
🟢 210 self.probe_map[probe_id] = (self._filepath, abs_lineno)
🟢 211 self.next_probe_id += 1
🟢 212 node = ast.Assign(
213 targets=[
214 ast.Subscript(
215 value=ast.Name(id=self._field_name, ctx=ast.Load()),
216 slice=ast.Constant(value=probe_id),
217 ctx=ast.Store(),
218 )
219 ],
220 value=ast.Constant(value=1),
221 lineno=rel_lineno,
222 col_offset=col_offset,
223 end_lineno=rel_lineno,
224 end_col_offset=col_offset,
225 )
🟢 226 return node
227
🟢 228 def _instrument_body(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
🟢 229 result: list[ast.stmt] = []
🟢 230 for stmt in stmts:
🟢 231 rel_lineno = getattr(stmt, "lineno", None)
🟢 232 if rel_lineno is not None:
🟢 233 abs_lineno = rel_lineno + self._start_lineno - 1
🟢 234 if abs_lineno not in self._seen_lines:
🟢 235 self._seen_lines.add(abs_lineno)
🟢 236 col = getattr(stmt, "col_offset", 0)
🟢 237 probe = self._make_probe(abs_lineno, rel_lineno, col)
🟢 238 if probe is not None:
🟢 239 result.append(probe)
🟢 240 result.append(self.visit(stmt))
🟢 241 return result
242
🟢 243 def visit_FunctionDef(self, node: ast.FunctionDef) -> ast.FunctionDef:
🟢 244 node.body = self._instrument_body(node.body)
🟢 245 return node
246
🟢 247 def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> ast.AsyncFunctionDef:
🔴 248 node.body = self._instrument_body(node.body)
🔴 249 return node
250
🟢 251 def visit_If(self, node: ast.If) -> ast.If:
🟢 252 node.body = self._instrument_body(node.body)
🟢 253 if node.orelse:
🟢 254 node.orelse = self._instrument_body(node.orelse)
🟢 255 return node
256
🟢 257 def visit_For(self, node: ast.For) -> ast.For:
🟢 258 node.body = self._instrument_body(node.body)
🟢 259 if node.orelse:
🟢 260 node.orelse = self._instrument_body(node.orelse)
🟢 261 return node
262
🟢 263 def visit_While(self, node: ast.While) -> ast.While:
🟢 264 node.body = self._instrument_body(node.body)
🟢 265 if node.orelse:
🟢 266 node.orelse = self._instrument_body(node.orelse)
🟢 267 return node
268
🟢 269 def visit_With(self, node: ast.With) -> ast.With:
🟢 270 node.body = self._instrument_body(node.body)
🟢 271 return node
272
🟢 273 def visit_Try(self, node: ast.Try) -> ast.Try:
🟢 274 node.body = self._instrument_body(node.body)
🟢 275 for handler in node.handlers:
🟢 276 handler.body = self._instrument_body(handler.body)
🟢 277 if node.orelse:
🟢 278 node.orelse = self._instrument_body(node.orelse)
🟢 279 if node.finalbody:
🟢 280 node.finalbody = self._instrument_body(node.finalbody)
🟢 281 return node
282
283
🟢 284 atexit.register(flush)
🟢 python/quadrants/lang/ast/ast_transformer_utils.py (100%)
🟢 335 if not name.startswith("_qd_"):
🟢 336 reason = f"{name} is in global vars, therefore violates pure"
🟢 337 violates_pure = True
🟢 python/quadrants/lang/kernel.py (80%)
19
🔴 20 def _kernel_coverage_enabled() -> bool:
🟢 21 return os.environ.get("QD_KERNEL_COVERAGE") == "1"
22
23
🟢 382 if _kernel_coverage_enabled():
🟢 383 from . import _kernel_coverage # pylint: disable=import-outside-toplevel
384
🟢 385 _kernel_coverage.ensure_field_allocated()
386
🟢 python/quadrants/lang/misc.py (100%)
496
🟢 497 if os.environ.get("QD_KERNEL_COVERAGE") == "1":
🟢 498 from . import _kernel_coverage # pylint: disable=import-outside-toplevel
499
🟢 500 _kernel_coverage.ensure_field_allocated()
501
🔴 python/quadrants/pytest_plugin.py (67%)
1 """Pytest plugin that auto-enables kernel coverage when pytest-cov is active.
2
3 Registered via the ``pytest11`` entry point so it loads automatically when quadrants is installed.
4 Opt out by setting ``QD_KERNEL_COVERAGE=0`` explicitly.
5 """
6
🔴 7 import os
8
9
🔴 10 def pytest_configure(config):
🟢 11 if not config.pluginmanager.hasplugin("_cov"):
🔴 12 return
🟢 13 os.environ.setdefault("QD_KERNEL_COVERAGE", "1")
🟢 14 if os.environ.get("QD_KERNEL_COVERAGE") != "1":
🟢 15 return
16 # Tell the kernel coverage module whether pytest-cov is running in branch (arc) mode,
17 # so it writes the matching format and avoids "Can not mix line and arc data" at combine time.
18 # We read config.option.cov_branch which pytest-cov has already populated by this point.
🟢 19 cov_branch = getattr(config.option, "cov_branch", False) or False
🟢 20 os.environ["_QD_KCOV_ARC"] = "1" if cov_branch else "0"
🟢 tests/python/quadrants/lang/fast_caching/test_src_ll_cache.py (100%)
11
🟢 12 _KERNEL_COVERAGE = os.environ.get("QD_KERNEL_COVERAGE") == "1"
🟢 67 @pytest.mark.skipif(
68 _KERNEL_COVERAGE,
69 reason="Coverage probes change LLVM IR addresses after reinit, breaking recompile comparison",
70 )
🟢 tests/python/quadrants/lang/test_kernel_impl.py (100%)
🟢 1 import os
🟢 11 _KERNEL_COVERAGE = os.environ.get("QD_KERNEL_COVERAGE") == "1"
13
🟢 14 @pytest.mark.skipif(
15 _KERNEL_COVERAGE,
16 reason="Coverage probes change the kernel AST, preventing FE-LL cache hits after reinit",
17 )
🟢 tests/python/test_api.py (100%)
🟢 438 actual = sorted([s for s in dir(src) if not s.startswith(("_", "@")) and s != "pytest_plugin"])
🟢 tests/python/test_intrinsics.py (100%)
55 (thread i does (i+1)*200000). Asserts strict monotonicity across threads and that
🟢 91 assert a[i - 1] < a[i] < a[i + 1]
🟢 tests/python/test_kernel_coverage.py (99%)
1 """Tests for kernel code coverage instrumentation.
2
3 These tests verify that the AST rewriter correctly inserts coverage probes and that the probes fire when kernel
4 code executes on the device.
5 """
6
🟢 7 import ast
🟢 8 import os
🟢 9 import textwrap
10
🟢 11 import pytest
12
🟢 13 import quadrants as qd
14
🟢 15 from tests import test_utils
16
17 # These tests only run when QD_KERNEL_COVERAGE=1
🟢 18 pytestmark = pytest.mark.skipif(
19 os.environ.get("QD_KERNEL_COVERAGE", "") != "1",
20 reason="QD_KERNEL_COVERAGE=1 not set",
21 )
22
23
24 # ---------------------------------------------------------------------------
25 # AST rewriter unit tests
26 # ---------------------------------------------------------------------------
27
🟢 28 _AST_REWRITER_CASES = [
29 pytest.param(
30 """\
31 def f():
32 x = 1
33 y = 2
34 return x + y
35 """,
36 {11, 12, 13},
37 10,
38 id="straight_line",
39 ),
40 pytest.param(
41 """\
42 def f():
43 if x > 0:
44 a = 1
45 else:
46 b = 2
47 """,
48 {2, 3, 5},
49 1,
50 id="if_else",
51 ),
52 pytest.param(
53 """\
54 def f():
55 for i in range(10):
56 x = i
57 """,
58 {2, 3},
59 1,
60 id="for_loop",
61 ),
62 pytest.param(
63 """\
64 def f():
65 while x > 0:
66 x = x - 1
67 else:
68 y = 0
69 """,
70 {2, 3, 5},
71 1,
72 id="while_loop_else",
73 ),
74 pytest.param(
75 """\
76 def f():
77 with ctx:
78 a = 1
79 b = 2
80 """,
81 {2, 3, 4},
82 1,
83 id="with_statement",
84 ),
85 pytest.param(
86 """\
87 def f():
88 try:
89 a = 1
90 except:
91 b = 2
92 else:
93 c = 3
94 finally:
95 d = 4
96 """,
97 {3, 5, 7, 9},
98 1,
99 id="try_except_finally",
100 ),
101 ]
102
103
🟢 104 @pytest.mark.parametrize("src,expected_lines,start_lineno", _AST_REWRITER_CASES)
🟢 105 def test_ast_rewriter(src, expected_lines, start_lineno):
106 """Verify the AST rewriter inserts probes at the expected source lines."""
🟢 107 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
108
🟢 109 tree = ast.parse(textwrap.dedent(src))
🟢 110 rewriter = _CoverageASTRewriter(
111 field_name="_qd_cov", filepath="test.py", start_lineno=start_lineno, probe_id_start=0
112 )
🟢 113 rewriter.visit(tree)
114
🟢 115 covered_lines = {lineno for _, (_, lineno) in rewriter.probe_map.items()}
🟢 116 assert expected_lines.issubset(covered_lines), f"Expected lines {expected_lines} to be probed, got {covered_lines}"
117
118
🟢 119 def test_ast_rewriter_capacity_limit():
120 """Verify that probes stop being inserted when the capacity limit is hit."""
🟢 121 import warnings
122
🟢 123 import quadrants.lang._kernel_coverage as kcov
🟢 124 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
125
🟢 126 src = textwrap.dedent(
127 """\
128 def f():
129 a = 1
130 b = 2
131 c = 3
132 """
133 )
🟢 134 tree = ast.parse(src)
🟢 135 old_warning_state = kcov._capacity_warning_emitted
🟢 136 kcov._capacity_warning_emitted = False
🟢 137 try:
🟢 138 with warnings.catch_warnings(record=True) as w:
🟢 139 warnings.simplefilter("always")
🟢 140 rewriter = _CoverageASTRewriter(
141 field_name="_qd_cov", filepath="test.py", start_lineno=1, probe_id_start=kcov._MAX_PROBES - 1
142 )
🟢 143 rewriter.visit(tree)
144
🟢 145 assert rewriter.next_probe_id == kcov._MAX_PROBES
🟢 146 assert len(rewriter.probe_map) == 1, f"Only 1 probe should fit, got {len(rewriter.probe_map)}"
🟢 147 assert len(w) == 1
🟢 148 assert "exceeded" in str(w[0].message).lower()
149 finally:
🟢 150 kcov._capacity_warning_emitted = old_warning_state
151
152
🟢 153 def test_ast_rewriter_deduplicates_same_line():
154 """Verify that two statements on the same source line get only one probe."""
🟢 155 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
156
🟢 157 src = "def f():\n a = 1; b = 2\n"
🟢 158 tree = ast.parse(src)
🟢 159 rewriter = _CoverageASTRewriter(field_name="_qd_cov", filepath="test.py", start_lineno=1, probe_id_start=0)
🟢 160 rewriter.visit(tree)
161
🟢 162 abs_lines = [lineno for _, (_, lineno) in rewriter.probe_map.items()]
🟢 163 assert abs_lines.count(2) == 1, f"Line 2 should have exactly one probe, got {abs_lines.count(2)}"
164
165
🟢 166 def test_env_var_max_probes():
167 """Verify that QD_COVERAGE_MAX_PROBES env var is read at import time."""
🟢 168 import quadrants.lang._kernel_coverage as kcov
169
🟢 170 assert kcov._MAX_PROBES == int(os.environ.get("QD_COVERAGE_MAX_PROBES", "100000"))
171
172
🟢 173 def test_harvest_field_exception_path():
174 """Verify that _harvest_field handles to_numpy() failure gracefully."""
🟢 175 from unittest.mock import MagicMock
176
🟢 177 import quadrants.lang._kernel_coverage as kcov
178
🟢 179 old_field = kcov._cov_field
🟢 180 old_prog = kcov._cov_field_prog
🟢 181 old_map = kcov._probe_map.copy()
🟢 182 try:
🟢 183 mock_field = MagicMock()
🟢 184 mock_field.to_numpy.side_effect = RuntimeError("runtime destroyed")
🟢 185 kcov._cov_field = mock_field
🟢 186 kcov._cov_field_prog = object()
🟢 187 kcov._probe_map[999999] = ("fake.py", 1)
188
189 # Should not raise — the exception is caught and logged
🟢 190 kcov._harvest_field()
191
🟢 192 assert kcov._cov_field is None, "Field should be cleared after failure"
🟢 193 assert kcov._cov_field_prog is None, "Field prog should be cleared after failure"
194 finally:
🟢 195 kcov._cov_field = old_field
🟢 196 kcov._cov_field_prog = old_prog
🟢 197 kcov._probe_map = old_map
198
199
200 # ---------------------------------------------------------------------------
201 # End-to-end tests
202 # ---------------------------------------------------------------------------
203
204
🟢 205 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 206 def test_kernel_coverage_branches_e2e():
207 """Verify that only the taken branch has its probe fired."""
🟢 208 from quadrants.lang import _kernel_coverage
209
🟢 210 _kernel_coverage.ensure_field_allocated()
211
🟢 212 probe_count_before = _kernel_coverage._probe_counter
🟢 213 out = qd.field(dtype=qd.i32, shape=(1,))
214
🟢 215 @qd.kernel
🟢 216 def branching_kernel():
🟢 217 x = 10
🟢 218 if x > 5:
🟢 219 out[0] = 1
220 else:
🔴 221 out[0] = 2
222
🟢 223 branching_kernel()
224
🟢 225 assert out[0] == 1
226
🟢 227 cov_field = _kernel_coverage.get_field()
🟢 228 arr = cov_field.to_numpy()
229
🟢 230 probes_for_kernel = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
231
🟢 232 taken_probes = {pid for pid, loc in probes_for_kernel.items() if arr[pid] != 0}
🟢 233 not_taken_probes = {pid for pid, loc in probes_for_kernel.items() if arr[pid] == 0}
234
🟢 235 assert len(taken_probes) > 0, "At least some probes should have fired"
🟢 236 assert len(not_taken_probes) > 0, "The else branch should not have been reached"
237
238
🟢 239 @test_utils.test(arch=qd.gpu)
🟢 240 def test_kernel_coverage_simt_e2e():
241 """Verify coverage probes track branches with block.sync() and subgroup shuffle.
242
243 The if/else is based on a runtime value read from a field, so the compiler cannot constant-fold it away.
244 Only the taken branch's shuffle probe should fire.
245 """
🟢 246 from quadrants.lang import _kernel_coverage
🟢 247 from quadrants.lang.simt import subgroup
248
🟢 249 _kernel_coverage.ensure_field_allocated()
250
🟢 251 N = 64
🟢 252 probe_count_before = _kernel_coverage._probe_counter
🟢 253 flag = qd.field(dtype=qd.i32, shape=(1,))
🟢 254 a = qd.field(dtype=qd.i32, shape=(N,))
🟢 255 out = qd.field(dtype=qd.i32, shape=(N,))
256
🟢 257 flag[0] = 1 # runtime value: take the if-branch
258
🟢 259 @qd.kernel
🟢 260 def simt_kernel():
🟢 261 qd.loop_config(block_dim=N)
🟢 262 for i in range(N):
🟢 263 a[i] = i + 1
🟢 264 qd.simt.block.sync()
🟢 265 if flag[0] > 0:
🟢 266 val = subgroup.shuffle(a[i], qd.u32(0))
🟢 267 out[i] = val
268 else:
🔴 269 val = subgroup.shuffle(a[i], qd.u32(1))
🔴 270 out[i] = val + 100
271
🟢 272 simt_kernel()
273
🟢 274 for i in range(4):
🟢 275 assert out[i] == 1, f"Expected 1 at index {i}, got {out[i]}"
276
🟢 277 cov_field = _kernel_coverage.get_field()
🟢 278 arr = cov_field.to_numpy()
279
🟢 280 probes_for_kernel = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
281
🟢 282 fired = {pid for pid in probes_for_kernel if arr[pid] != 0}
🟢 283 not_fired = {pid for pid in probes_for_kernel if arr[pid] == 0}
🟢 284 assert len(fired) >= 4, f"Expected at least 4 probes to fire, got {len(fired)}"
🟢 285 assert len(not_fired) >= 2, "The else branch should not have been reached"
286
287
🟢 288 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 289 def test_kernel_coverage_survives_reinit():
290 """Verify that coverage data accumulated before qd.init() reset is preserved.
291
292 Runs a kernel, then resets via qd.reset()/qd.init() (which triggers the _hooked_clear harvest), runs another
293 kernel, harvests again, and checks that _accumulated_lines contains data from both sessions.
294 """
🟢 295 from quadrants.lang import _kernel_coverage, impl
296
🟢 297 current_arch = impl.get_runtime()._arch
🟢 298 _kernel_coverage.ensure_field_allocated()
299
🟢 300 probe_count_before = _kernel_coverage._probe_counter
🟢 301 out1 = qd.field(dtype=qd.i32, shape=(1,))
302
🟢 303 @qd.kernel
🟢 304 def kernel_before_reset():
🟢 305 out1[0] = 1
306
🟢 307 kernel_before_reset()
308
🟢 309 cov_field = _kernel_coverage.get_field()
🟢 310 assert cov_field is not None
🟢 311 arr = cov_field.to_numpy()
🟢 312 probes_first = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 313 fired_first = {pid for pid in probes_first if arr[pid] != 0}
🟢 314 assert len(fired_first) > 0, "Probes from first kernel should have fired"
315
316 # Don't call _harvest_field() manually — let qd.reset() trigger it via the _hooked_clear hook
🟢 317 qd.reset()
318
319 # Verify the hook harvested data from the first session
🟢 320 files_before = set(_kernel_coverage._accumulated_lines.keys())
🟢 321 assert len(files_before) > 0, "Hook should have harvested data during reset"
🟢 322 lines_before = {}
🟢 323 for f, lines in _kernel_coverage._accumulated_lines.items():
🟢 324 lines_before[f] = set(lines)
325
🟢 326 qd.init(arch=current_arch)
327
🟢 328 _kernel_coverage.ensure_field_allocated()
329
🟢 330 probe_count_mid = _kernel_coverage._probe_counter
🟢 331 out2 = qd.field(dtype=qd.i32, shape=(1,))
332
🟢 333 @qd.kernel
🟢 334 def kernel_after_reset():
🟢 335 out2[0] = 2
336
🟢 337 kernel_after_reset()
338
🟢 339 _kernel_coverage._harvest_field()
340
🟢 341 for f in files_before:
🟢 342 assert (
343 f in _kernel_coverage._accumulated_lines
344 ), f"File {f} from before reset should still be in _accumulated_lines"
🟢 345 assert lines_before[f].issubset(
346 _kernel_coverage._accumulated_lines[f]
347 ), "Lines from before reset should be preserved"
348
🟢 349 probes_second = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_mid}
🟢 350 second_files = {loc[0] for loc in probes_second.values()}
🟢 351 for f in second_files:
🟢 352 assert f in _kernel_coverage._accumulated_lines, f"File {f} from second kernel should be in _accumulated_lines"
353
354
🟢 355 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 356 def test_kernel_coverage_autodiff():
357 """Verify that autodiff forward pass produces probes but backward does not.
358
359 The forward compilation (AutodiffMode.NONE) should insert probes that fire. The backward compilation
360 (AutodiffMode.REVERSE) should not add any probes.
361 """
🟢 362 from quadrants.lang import _kernel_coverage
363
🟢 364 _kernel_coverage.ensure_field_allocated()
365
🟢 366 x = qd.field(dtype=qd.f32, shape=(), needs_grad=True)
🟢 367 loss = qd.field(dtype=qd.f32, shape=(), needs_grad=True)
368
🟢 369 @qd.kernel
🟢 370 def compute():
🟢 371 loss[None] = x[None] * x[None]
372
🟢 373 x[None] = 5.0
374
🟢 375 probe_count_before = _kernel_coverage._probe_counter
376
🟢 377 with qd.ad.Tape(loss):
🟢 378 compute()
379
🟢 380 probe_count_after_tape = _kernel_coverage._probe_counter
🟢 381 forward_probes = probe_count_after_tape - probe_count_before
🟢 382 assert forward_probes > 0, "Forward compilation should have inserted probes"
383
384 # Verify forward probes actually fired
🟢 385 cov_field = _kernel_coverage.get_field()
🟢 386 assert cov_field is not None
🟢 387 arr = cov_field.to_numpy()
🟢 388 probes = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 389 fired = {pid for pid in probes if arr[pid] != 0}
🟢 390 assert len(fired) > 0, "Forward pass inside Tape should produce fired coverage probes"
391
392 # Verify backward pass computes correct gradients
🟢 393 assert loss[None] == pytest.approx(25.0)
🟢 394 assert x.grad[None] == pytest.approx(10.0)
395
396
🟢 397 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 398 def test_kernel_coverage_qd_func():
399 """Verify that probes fire inside a @qd.func called from a kernel."""
🟢 400 from quadrants.lang import _kernel_coverage
401
🟢 402 _kernel_coverage.ensure_field_allocated()
403
🟢 404 probe_count_before = _kernel_coverage._probe_counter
🟢 405 out = qd.field(dtype=qd.i32, shape=(1,))
406
🟢 407 @qd.func
🟢 408 def helper():
🟢 409 out[0] = 99
410
🟢 411 @qd.kernel
🟢 412 def caller():
🟢 413 helper()
414
🟢 415 caller()
416
🟢 417 assert out[0] == 99
418
🟢 419 cov_field = _kernel_coverage.get_field()
🟢 420 assert cov_field is not None
🟢 421 arr = cov_field.to_numpy()
422
🟢 423 probes = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 424 fired = {pid for pid in probes if arr[pid] != 0}
425 # The kernel body has one statement (helper()), and the func body has one (out[0] = 99).
426 # Both should produce probes that fire.
🟢 427 assert (
428 len(fired) >= 2
429 ), f"Expected probes from both kernel and func to fire, got {len(fired)} fired out of {len(probes)}"
430
431
🟢 432 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 433 def test_kernel_coverage_multiple_kernels_same_session():
434 """Verify that probes from two different kernels both fire in the same session."""
🟢 435 from quadrants.lang import _kernel_coverage
436
🟢 437 _kernel_coverage.ensure_field_allocated()
438
🟢 439 probe_count_before = _kernel_coverage._probe_counter
🟢 440 a = qd.field(dtype=qd.i32, shape=(1,))
🟢 441 b = qd.field(dtype=qd.i32, shape=(1,))
442
🟢 443 @qd.kernel
🟢 444 def kernel_a():
🟢 445 a[0] = 10
446
🟢 447 @qd.kernel
🟢 448 def kernel_b():
🟢 449 b[0] = 20
450
🟢 451 kernel_a()
🟢 452 probe_count_after_a = _kernel_coverage._probe_counter
🟢 453 kernel_b()
454
🟢 455 assert a[0] == 10
🟢 456 assert b[0] == 20
457
🟢 458 cov_field = _kernel_coverage.get_field()
🟢 459 arr = cov_field.to_numpy()
460
🟢 461 probes_a = {
462 pid: loc for pid, loc in _kernel_coverage._probe_map.items() if probe_count_before <= pid < probe_count_after_a
463 }
🟢 464 probes_b = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_after_a}
465
🟢 466 fired_a = {pid for pid in probes_a if arr[pid] != 0}
🟢 467 fired_b = {pid for pid in probes_b if arr[pid] != 0}
468
🟢 469 assert len(fired_a) > 0, "Probes from kernel_a should have fired"
🟢 470 assert len(fired_b) > 0, "Probes from kernel_b should have fired"
471
472
🟢 473 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 474 def test_qd_prefix_exemption_pure_kernel():
475 """Verify that _qd_-prefixed globals don't violate pure kernel checks.
476
477 With kernel coverage enabled, _qd_cov is injected as a global. This test verifies that a pure (fastcache)
478 kernel still compiles without error. The kernel uses ndarray arguments (not global fields) because pure
479 kernels prohibit non-_qd_ globals.
480 """
🟢 481 a = qd.ndarray(qd.i32, (1,))
482
🟢 483 @qd.kernel(fastcache=True)
🟢 484 def pure_kernel(arr: qd.types.NDArray) -> None:
🟢 485 arr[0] = 42
486
🟢 487 pure_kernel(a)
🟢 488 assert a[0] == 42
🟢 tests/python/test_offline_cache.py (100%)
16 # Coverage field allocation creates internal fill kernels that change cache file counts.
17 # CI runs these tests in a separate phase without QD_KERNEL_COVERAGE (see 4_test.sh).
🟢 18 pytestmark = pytest.mark.skipif(
19 os.environ.get("QD_KERNEL_COVERAGE") == "1",
20 reason="Kernel coverage adds internal kernels that invalidate cache file count assertions",
21 )
22
🟢 tests/python/test_snode_layout_inspection.py (100%)
🟢 1 import os
2
🟢 3 import pytest
4
🟢 10 @pytest.mark.skipif(
11 os.environ.get("QD_KERNEL_COVERAGE") == "1",
12 reason="Kernel coverage field on root shifts offset assertions",
13 )
| cov_branch = getattr(config.option, "cov_branch", False) or False | ||
| os.environ["_QD_KCOV_ARC"] = "1" if cov_branch else "0" |
There was a problem hiding this comment.
🔴 The pytest plugin at pytest_plugin.py:19 reads only config.option.cov_branch to decide arc vs line mode, but pytest-cov defines --cov-branch with default=None, so users who enable branch coverage via branch = True in .coveragerc or [tool.coverage.run] in pyproject.toml (without passing the CLI flag) cause cov_branch to evaluate to None or False = False, setting _QD_KCOV_ARC=0. pytest-cov actually runs in arc mode (coverage.py applies the config-file setting), so _qd_kcov.* gets written in line format while .coverage is in arc format. coverage combine then fails with "Can not mix line and arc data", and the fallback in tests/coverage_report.py:combine_coverage() silently drops all kernel coverage. Fix: read the authoritative resolved value via config.pluginmanager.get_plugin("_cov").cov_controller.cov.config.branch.
Extended reasoning...
What the bug is and how it manifests
The pytest plugin in python/quadrants/pytest_plugin.py (lines 10-20) tries to inform _kernel_coverage whether pytest-cov is running in branch (arc) mode by setting the _QD_KCOV_ARC env var. The check at line 19 is:
cov_branch = getattr(config.option, "cov_branch", False) or False
os.environ["_QD_KCOV_ARC"] = "1" if cov_branch else "0"This reflects only the pytest CLI flag --cov-branch, not the branch setting from coverage.py configuration files. pytest-cov registers --cov-branch with default=None and action='store_true' (verified at /usr/local/lib/python3.11/dist-packages/pytest_cov/plugin.py:160-163), so config.option.cov_branch is None whenever the CLI flag is absent. None or False == False, so _QD_KCOV_ARC gets set to '0' (line mode).
But pytest-cov passes branch=options.cov_branch directly to coverage.Coverage(...) (engine.py:61, 239). When None is passed, coverage.config.from_args() skips it (if v is not None, config.py:288-294), so any config-file [run] branch = True setting survives. pytest-cov is therefore in arc mode while the plugin sets _QD_KCOV_ARC=0.
The specific code path that triggers it
- User has
[run] branch = truein.coveragerc(or[tool.coverage.run] branch = trueinpyproject.toml) and runspytest --cov=my_pkgwithout--cov-branch. - Quadrants pytest plugin runs:
config.option.cov_branchisNone;None or False = False; sets_QD_KCOV_ARC=0. _kernel_coverage._detect_arc_mode()reads_QD_KCOV_ARC=0and returnsFalse(line mode).flush()writes_qd_kcov.<pid>in line format.- Meanwhile pytest-cov writes
.coveragein arc format (becausecoverage.Coverage(branch=None)reads.coveragercand appliesbranch=True). tests/coverage_report.py:combine_coverage()runscoverage combine .coverage.pytest _qd_kcov.*→DataError: Can not mix line and arc data.- The fallback at lines 50-51 (
coverage combine .coverage.pytestalone) silently drops all kernel coverage.
Empirical verification
A verifier confirmed empirically with pytest-cov 7.1.0:
# With .coveragerc containing [run]\nbranch = True:
cov = coverage.Coverage(source=['.'], branch=None, config_file='.coveragerc')
print(cov.config.branch) # -> True
# config.option.cov_branch -> None
# cov_branch after `or False` -> False (incorrect)
# cov_controller.cov.config.branch -> True (authoritative)This proves the plugin's check disagrees with pytest-cov's actual mode.
Why existing code does not prevent it
The plugin guard getattr(config.option, "cov_branch", False) or False is purely a CLI-flag check. Coverage.py's three-layer config resolution (defaults → config file → from_args) is opaque to the plugin code. There is no fallback path that reads coverage.config after pytest-cov has resolved it. The PR's own combine_coverage() fallback masks the resulting error rather than surfacing it.
Impact
Any downstream user of quadrants who enables branch coverage via .coveragerc or pyproject.toml and runs pytest --cov=my_pkg without explicitly passing --cov-branch will silently lose all kernel coverage. This is a common, idiomatic configuration pattern — many Python projects keep coverage settings in config files rather than command lines. The failure is silent: the coverage comment on the PR shows pytest-cov coverage but no kernel data, with no warning or error visible to the user.
Step-by-step proof
- Project has
pyproject.tomlcontaining[tool.coverage.run]withbranch = true. - Developer runs
pytest --cov=my_pkg tests/(no--cov-branch). - pytest-cov initializes:
options.cov_branch = None.Coverage(branch=None, ...)reads pyproject.toml, setsconfig.branch = True. Tracer starts in arc mode. - quadrants pytest plugin's
pytest_configurefires:config.option.cov_branch = None;None or False = False;_QD_KCOV_ARC = "0". - Tests run. Kernels execute. Probes accumulate.
- atexit triggers
flush():_detect_arc_mode()reads_QD_KCOV_ARC=0→ False →add_linespath →_qd_kcov.<pid>in line format. - CI runs
combine_coverage():coverage combine .coverage.pytest _qd_kcov.*→DataError: Can not mix line and arc data→ fallback drops kernel files → final.coveragecontains only pytest-cov data. - Coverage report shows 0% kernel coverage.
How to fix it
Query the authoritative source — pytest-cov's already-constructed Coverage instance — instead of the CLI option:
def pytest_configure(config):
if not config.pluginmanager.hasplugin("_cov"):
return
os.environ.setdefault("QD_KERNEL_COVERAGE", "1")
if os.environ.get("QD_KERNEL_COVERAGE") != "1":
return
branch = False
cov_plugin = config.pluginmanager.get_plugin("_cov")
if cov_plugin is not None:
ctrl = getattr(cov_plugin, "cov_controller", None)
if ctrl is not None and getattr(ctrl, "cov", None) is not None:
branch = bool(ctrl.cov.config.branch)
os.environ["_QD_KCOV_ARC"] = "1" if branch else "0"pytest-cov constructs cov_controller in pytest_load_initial_conftests (with tryfirst=True), which runs strictly before pytest_configure, so cov_controller.cov.config.branch is already authoritative and queryable by the time this hook fires. This reflects the merged result of CLI flags, config files, and defaults — matching pytest-cov's actual runtime behavior.
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 90% |
| Overall project coverage | 72% |
Total: 450 lines, 47 missing, 90% covered
🔴 python/quadrants/lang/_fast_caching/src_hasher.py (0%)
🔴 2 import os
53 "kcov" if os.environ.get("QD_KERNEL_COVERAGE") == "1" else "",
🟢 python/quadrants/lang/_func_base.py (82%)
🔴 7 import os
25
🔴 26 def _kernel_coverage_enabled() -> bool:
🟢 27 return os.environ.get("QD_KERNEL_COVERAGE") == "1"
28
29
🟢 255 _kcov = None
🟢 256 if _kernel_coverage_enabled() and autodiff_mode == _qd_core.AutodiffMode.NONE:
🟢 257 from . import ( # pylint: disable=import-outside-toplevel
258 _kernel_coverage as _kcov,
259 )
260
🟢 261 tree = _kcov.rewrite_ast(tree, function_source_info.filepath, function_source_info.start_lineno)
262
🟢 266 if _kcov is not None:
🟢 267 cov_field = _kcov.get_field()
🟢 268 if cov_field is not None:
🟢 269 global_vars[_kcov.FIELD_VAR_NAME] = cov_field
🔴 python/quadrants/lang/_kernel_coverage.py (79%)
1 """Kernel code coverage via Python AST rewriting.
2
3 When enabled (QD_KERNEL_COVERAGE=1), this module rewrites kernel and func ASTs to insert coverage probes — field
4 stores that record which source lines actually execute on the GPU. At process exit, the collected data is written
5 to a .coverage file compatible with coverage.py / pytest-cov / diff-cover.
6
7 The probes are compiled as ordinary field stores by the existing pipeline, so no C++ changes are needed. When
8 disabled, this module is never imported and has zero impact on the normal runtime path.
9 """
10
🟢 11 import ast
🟢 12 import atexit
🟢 13 import logging
🟢 14 import os
🟢 15 import threading
🟢 16 import warnings
🟢 17 from typing import TYPE_CHECKING
18
🟢 19 from coverage import CoverageData # type: ignore[import-not-found]
20
🟢 21 import quadrants as qd
🟢 22 from quadrants.lang import impl
23
24 if TYPE_CHECKING:
25 from quadrants.lang.field import ScalarField
26
🟢 27 FIELD_VAR_NAME = "_qd_cov"
🟢 28 _MAX_PROBES = int(os.environ.get("QD_COVERAGE_MAX_PROBES", "100000"))
29
🟢 30 _lock = threading.Lock()
🟢 31 _cov_field: "ScalarField | None" = None
🟢 32 _cov_field_prog: object | None = None # tracks which Program instance owns _cov_field
🟢 33 _probe_counter: int = 0
34 # {probe_id: (filepath, absolute_lineno)}
🟢 35 _probe_map: dict[int, tuple[str, int]] = {}
36 # Accumulated coverage lines surviving across qd.init() resets
🟢 37 _accumulated_lines: dict[str, set[int]] = {}
🟢 38 _reset_hook_installed: bool = False
39 # Directory for .coverage and _qd_kcov.* files, captured when coverage is first enabled
🟢 40 _coverage_dir: str | None = None
41
42
🟢 43 def _harvest_field() -> None:
44 """Read probe data from the current field into _accumulated_lines.
45
46 Must be called while the runtime is still alive (before clear()).
47 """
48 global _cov_field, _cov_field_prog
🟢 49 with _lock:
🟢 50 if _cov_field is None or not _probe_map:
🟢 51 return
🟢 52 field_ref = _cov_field
🟢 53 probe_snapshot = dict(_probe_map)
🟢 54 _cov_field = None
🟢 55 _cov_field_prog = None
🟢 56 try:
🟢 57 arr = field_ref.to_numpy()
🟢 58 except Exception:
🟢 59 logging.warning("Failed to read coverage field, coverage data for this session will be lost", exc_info=True)
🟢 60 return
🟢 61 with _lock:
🟢 62 for probe_id, (filepath, lineno) in probe_snapshot.items():
🟢 63 if probe_id < len(arr) and arr[probe_id] != 0:
🟢 64 _accumulated_lines.setdefault(filepath, set()).add(lineno)
65
66
🟢 67 def _install_reset_hook() -> None:
68 """Monkey-patch PyQuadrants.clear() to harvest probes before destruction."""
69 global _reset_hook_installed
🟢 70 if _reset_hook_installed:
🟢 71 return
🟢 72 _original_clear = impl.PyQuadrants.clear
73
🟢 74 def _hooked_clear(self) -> None:
🟢 75 _harvest_field()
🟢 76 _original_clear(self)
77
🟢 78 impl.PyQuadrants.clear = _hooked_clear # type: ignore[assignment]
🟢 79 _reset_hook_installed = True
80
81
🟢 82 def ensure_field_allocated() -> None:
83 """Allocate (or re-allocate after qd.init()) the global coverage field."""
84 global _cov_field, _cov_field_prog, _coverage_dir
🟢 85 _install_reset_hook()
🟢 86 if _coverage_dir is None:
🟢 87 _coverage_dir = os.getcwd()
🟢 88 current_prog = impl.get_runtime()._prog
🟢 89 if _cov_field is not None and _cov_field_prog is current_prog:
🟢 90 return
🟢 91 with _lock:
🟢 92 current_prog = impl.get_runtime()._prog
🟢 93 if _cov_field is not None and _cov_field_prog is current_prog:
🔴 94 return
🟢 95 _cov_field = qd.field(dtype=qd.i32, shape=(_MAX_PROBES,)) # type: ignore[assignment]
🟢 96 _cov_field_prog = current_prog
97
98
🟢 99 def get_field() -> "ScalarField | None":
🟢 100 with _lock:
🟢 101 if _cov_field_prog is not impl.get_runtime()._prog:
🔴 102 return None
🟢 103 return _cov_field
104
105
🟢 106 def rewrite_ast(tree: ast.Module, filepath: str, start_lineno: int) -> ast.Module:
107 """Rewrite a kernel/func AST to insert coverage probes.
108
109 Each executable statement at a new source line gets a probe: ``_qd_cov[<probe_id>] = 1``.
110 Probes inside if/else bodies only fire when that branch is taken, giving true runtime branch coverage.
111 """
112 global _probe_counter
🟢 113 with _lock:
🟢 114 rewriter = _CoverageASTRewriter(
115 field_name=FIELD_VAR_NAME,
116 filepath=filepath,
117 start_lineno=start_lineno,
118 probe_id_start=_probe_counter,
119 )
🟢 120 tree = rewriter.visit(tree)
🟢 121 ast.fix_missing_locations(tree)
🟢 122 _probe_counter = rewriter.next_probe_id
🟢 123 _probe_map.update(rewriter.probe_map)
🟢 124 return tree
125
126
🟢 127 def _detect_arc_mode() -> bool:
128 """Detect whether pytest-cov is running in branch (arc) mode.
129
130 Checks _QD_KCOV_ARC env var first (set by the pytest plugin), then falls back to reading .coverage.
131 Defaults to False (line mode) when nothing is known, since ``pytest --cov`` without ``--cov-branch``
132 is the more common invocation.
133 """
🔴 134 arc_env = os.environ.get("_QD_KCOV_ARC")
🔴 135 if arc_env is not None:
🔴 136 return arc_env == "1"
🔴 137 try:
🔴 138 cov_path = os.path.join(_coverage_dir, ".coverage") if _coverage_dir else ".coverage"
🔴 139 cd = CoverageData(basename=cov_path)
🔴 140 cd.read()
🔴 141 if not cd.measured_files():
🔴 142 return False
🔴 143 return cd.has_arcs()
🔴 144 except Exception:
🔴 145 logging.debug("Failed to detect arc mode from .coverage file, defaulting to line mode", exc_info=True)
🔴 146 return False
147
148
🟢 149 def flush() -> None:
150 """Harvest any remaining field data and write all results to a .coverage file.
151
152 If .coverage.kernel already exists (e.g. from a prior test phase), the new data is merged into it so nothing
153 is lost across multiple invocations.
154 """
🔴 155 _harvest_field()
156
🔴 157 with _lock:
🔴 158 if not _accumulated_lines:
🔴 159 return
🔴 160 snapshot = {f: set(lines) for f, lines in _accumulated_lines.items()}
161
🔴 162 base_dir = _coverage_dir or os.getcwd()
🔴 163 kernel_path = os.path.join(base_dir, f"_qd_kcov.{os.getpid()}")
🔴 164 use_arcs = _detect_arc_mode()
165
🔴 166 cov = CoverageData(basename=kernel_path)
🔴 167 if use_arcs:
🔴 168 arcs_by_file: dict[str, list[tuple[int, int]]] = {}
🔴 169 for filepath, lines in snapshot.items():
170 # Emit only entry/exit arcs per line — we know which lines ran but not the actual transitions
171 # between them, so we avoid fabricating inter-line arcs that would misrepresent branch coverage.
🔴 172 arcs = []
🔴 173 for line in sorted(lines):
🔴 174 arcs.append((-1, line))
🔴 175 arcs.append((line, -1))
🔴 176 arcs_by_file[filepath] = arcs
🔴 177 cov.add_arcs(arcs_by_file)
178 else:
🔴 179 cov.add_lines({f: sorted(lines) for f, lines in snapshot.items()})
🔴 180 cov.write()
181
182
🟢 183 _capacity_warning_emitted = False
184
185
🟢 186 class _CoverageASTRewriter(ast.NodeTransformer):
187 """Insert coverage probes before each statement at a new source line."""
188
🟢 189 def __init__(self, field_name: str, filepath: str, start_lineno: int, probe_id_start: int) -> None:
🟢 190 self._field_name = field_name
🟢 191 self._filepath = filepath
🟢 192 self._start_lineno = start_lineno
🟢 193 self.next_probe_id = probe_id_start
🟢 194 self._seen_lines: set[int] = set()
🟢 195 self.probe_map: dict[int, tuple[str, int]] = {}
196
🟢 197 def _make_probe(self, abs_lineno: int, rel_lineno: int, col_offset: int) -> ast.Assign | None:
198 global _capacity_warning_emitted
🟢 199 probe_id = self.next_probe_id
🟢 200 if probe_id >= _MAX_PROBES:
🟢 201 if not _capacity_warning_emitted:
🟢 202 warnings.warn(
203 f"Kernel coverage probe capacity ({_MAX_PROBES}) exceeded. "
204 f"Additional kernel lines will not be tracked. "
205 f"Set QD_COVERAGE_MAX_PROBES to a higher value.",
206 stacklevel=2,
207 )
🟢 208 _capacity_warning_emitted = True
🟢 209 return None
🟢 210 self.probe_map[probe_id] = (self._filepath, abs_lineno)
🟢 211 self.next_probe_id += 1
🟢 212 node = ast.Assign(
213 targets=[
214 ast.Subscript(
215 value=ast.Name(id=self._field_name, ctx=ast.Load()),
216 slice=ast.Constant(value=probe_id),
217 ctx=ast.Store(),
218 )
219 ],
220 value=ast.Constant(value=1),
221 lineno=rel_lineno,
222 col_offset=col_offset,
223 end_lineno=rel_lineno,
224 end_col_offset=col_offset,
225 )
🟢 226 return node
227
🟢 228 def _instrument_body(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
🟢 229 result: list[ast.stmt] = []
🟢 230 for stmt in stmts:
🟢 231 rel_lineno = getattr(stmt, "lineno", None)
🟢 232 if rel_lineno is not None:
🟢 233 abs_lineno = rel_lineno + self._start_lineno - 1
🟢 234 if abs_lineno not in self._seen_lines:
🟢 235 self._seen_lines.add(abs_lineno)
🟢 236 col = getattr(stmt, "col_offset", 0)
🟢 237 probe = self._make_probe(abs_lineno, rel_lineno, col)
🟢 238 if probe is not None:
🟢 239 result.append(probe)
🟢 240 result.append(self.visit(stmt))
🟢 241 return result
242
🟢 243 def visit_FunctionDef(self, node: ast.FunctionDef) -> ast.FunctionDef:
🟢 244 node.body = self._instrument_body(node.body)
🟢 245 return node
246
🟢 247 def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> ast.AsyncFunctionDef:
🔴 248 node.body = self._instrument_body(node.body)
🔴 249 return node
250
🟢 251 def visit_If(self, node: ast.If) -> ast.If:
🟢 252 node.body = self._instrument_body(node.body)
🟢 253 if node.orelse:
🟢 254 node.orelse = self._instrument_body(node.orelse)
🟢 255 return node
256
🟢 257 def visit_For(self, node: ast.For) -> ast.For:
🟢 258 node.body = self._instrument_body(node.body)
🟢 259 if node.orelse:
🟢 260 node.orelse = self._instrument_body(node.orelse)
🟢 261 return node
262
🟢 263 def visit_While(self, node: ast.While) -> ast.While:
🟢 264 node.body = self._instrument_body(node.body)
🟢 265 if node.orelse:
🟢 266 node.orelse = self._instrument_body(node.orelse)
🟢 267 return node
268
🟢 269 def visit_With(self, node: ast.With) -> ast.With:
🟢 270 node.body = self._instrument_body(node.body)
🟢 271 return node
272
🟢 273 def visit_Try(self, node: ast.Try) -> ast.Try:
🟢 274 node.body = self._instrument_body(node.body)
🟢 275 for handler in node.handlers:
🟢 276 handler.body = self._instrument_body(handler.body)
🟢 277 if node.orelse:
🟢 278 node.orelse = self._instrument_body(node.orelse)
🟢 279 if node.finalbody:
🟢 280 node.finalbody = self._instrument_body(node.finalbody)
🟢 281 return node
282
283
🟢 284 atexit.register(flush)
🟢 python/quadrants/lang/ast/ast_transformer_utils.py (100%)
🟢 335 if not name.startswith("_qd_"):
🟢 336 reason = f"{name} is in global vars, therefore violates pure"
🟢 337 violates_pure = True
🟢 python/quadrants/lang/kernel.py (80%)
19
🔴 20 def _kernel_coverage_enabled() -> bool:
🟢 21 return os.environ.get("QD_KERNEL_COVERAGE") == "1"
22
23
🟢 382 if _kernel_coverage_enabled():
🟢 383 from . import _kernel_coverage # pylint: disable=import-outside-toplevel
384
🟢 385 _kernel_coverage.ensure_field_allocated()
386
🟢 python/quadrants/lang/misc.py (100%)
496
🟢 497 if os.environ.get("QD_KERNEL_COVERAGE") == "1":
🟢 498 from . import _kernel_coverage # pylint: disable=import-outside-toplevel
499
🟢 500 _kernel_coverage.ensure_field_allocated()
501
🔴 python/quadrants/pytest_plugin.py (67%)
1 """Pytest plugin that auto-enables kernel coverage when pytest-cov is active.
2
3 Registered via the ``pytest11`` entry point so it loads automatically when quadrants is installed.
4 Opt out by setting ``QD_KERNEL_COVERAGE=0`` explicitly.
5 """
6
🔴 7 import os
8
9
🔴 10 def pytest_configure(config):
🟢 11 if not config.pluginmanager.hasplugin("_cov"):
🔴 12 return
🟢 13 os.environ.setdefault("QD_KERNEL_COVERAGE", "1")
🟢 14 if os.environ.get("QD_KERNEL_COVERAGE") != "1":
🟢 15 return
16 # Tell the kernel coverage module whether pytest-cov is running in branch (arc) mode,
17 # so it writes the matching format and avoids "Can not mix line and arc data" at combine time.
18 # We read config.option.cov_branch which pytest-cov has already populated by this point.
🟢 19 cov_branch = getattr(config.option, "cov_branch", False) or False
🟢 20 os.environ["_QD_KCOV_ARC"] = "1" if cov_branch else "0"
🟢 tests/python/quadrants/lang/fast_caching/test_src_ll_cache.py (100%)
11
🟢 12 _KERNEL_COVERAGE = os.environ.get("QD_KERNEL_COVERAGE") == "1"
🟢 67 @pytest.mark.skipif(
68 _KERNEL_COVERAGE,
69 reason="Coverage probes change LLVM IR addresses after reinit, breaking recompile comparison",
70 )
🟢 tests/python/quadrants/lang/test_kernel_impl.py (100%)
🟢 1 import os
🟢 11 _KERNEL_COVERAGE = os.environ.get("QD_KERNEL_COVERAGE") == "1"
13
🟢 14 @pytest.mark.skipif(
15 _KERNEL_COVERAGE,
16 reason="Coverage probes change the kernel AST, preventing FE-LL cache hits after reinit",
17 )
🟢 tests/python/test_api.py (100%)
🟢 438 actual = sorted([s for s in dir(src) if not s.startswith(("_", "@")) and s != "pytest_plugin"])
🟢 tests/python/test_kernel_coverage.py (99%)
1 """Tests for kernel code coverage instrumentation.
2
3 These tests verify that the AST rewriter correctly inserts coverage probes and that the probes fire when kernel
4 code executes on the device.
5 """
6
🟢 7 import ast
🟢 8 import os
🟢 9 import textwrap
10
🟢 11 import pytest
12
🟢 13 import quadrants as qd
14
🟢 15 from tests import test_utils
16
17 # These tests only run when QD_KERNEL_COVERAGE=1
🟢 18 pytestmark = pytest.mark.skipif(
19 os.environ.get("QD_KERNEL_COVERAGE", "") != "1",
20 reason="QD_KERNEL_COVERAGE=1 not set",
21 )
22
23
24 # ---------------------------------------------------------------------------
25 # AST rewriter unit tests
26 # ---------------------------------------------------------------------------
27
🟢 28 _AST_REWRITER_CASES = [
29 pytest.param(
30 """\
31 def f():
32 x = 1
33 y = 2
34 return x + y
35 """,
36 {11, 12, 13},
37 10,
38 id="straight_line",
39 ),
40 pytest.param(
41 """\
42 def f():
43 if x > 0:
44 a = 1
45 else:
46 b = 2
47 """,
48 {2, 3, 5},
49 1,
50 id="if_else",
51 ),
52 pytest.param(
53 """\
54 def f():
55 for i in range(10):
56 x = i
57 """,
58 {2, 3},
59 1,
60 id="for_loop",
61 ),
62 pytest.param(
63 """\
64 def f():
65 while x > 0:
66 x = x - 1
67 else:
68 y = 0
69 """,
70 {2, 3, 5},
71 1,
72 id="while_loop_else",
73 ),
74 pytest.param(
75 """\
76 def f():
77 with ctx:
78 a = 1
79 b = 2
80 """,
81 {2, 3, 4},
82 1,
83 id="with_statement",
84 ),
85 pytest.param(
86 """\
87 def f():
88 try:
89 a = 1
90 except:
91 b = 2
92 else:
93 c = 3
94 finally:
95 d = 4
96 """,
97 {3, 5, 7, 9},
98 1,
99 id="try_except_finally",
100 ),
101 ]
102
103
🟢 104 @pytest.mark.parametrize("src,expected_lines,start_lineno", _AST_REWRITER_CASES)
🟢 105 def test_ast_rewriter(src, expected_lines, start_lineno):
106 """Verify the AST rewriter inserts probes at the expected source lines."""
🟢 107 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
108
🟢 109 tree = ast.parse(textwrap.dedent(src))
🟢 110 rewriter = _CoverageASTRewriter(
111 field_name="_qd_cov", filepath="test.py", start_lineno=start_lineno, probe_id_start=0
112 )
🟢 113 rewriter.visit(tree)
114
🟢 115 covered_lines = {lineno for _, (_, lineno) in rewriter.probe_map.items()}
🟢 116 assert expected_lines.issubset(covered_lines), f"Expected lines {expected_lines} to be probed, got {covered_lines}"
117
118
🟢 119 def test_ast_rewriter_capacity_limit():
120 """Verify that probes stop being inserted when the capacity limit is hit."""
🟢 121 import warnings
122
🟢 123 import quadrants.lang._kernel_coverage as kcov
🟢 124 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
125
🟢 126 src = textwrap.dedent(
127 """\
128 def f():
129 a = 1
130 b = 2
131 c = 3
132 """
133 )
🟢 134 tree = ast.parse(src)
🟢 135 old_warning_state = kcov._capacity_warning_emitted
🟢 136 kcov._capacity_warning_emitted = False
🟢 137 try:
🟢 138 with warnings.catch_warnings(record=True) as w:
🟢 139 warnings.simplefilter("always")
🟢 140 rewriter = _CoverageASTRewriter(
141 field_name="_qd_cov", filepath="test.py", start_lineno=1, probe_id_start=kcov._MAX_PROBES - 1
142 )
🟢 143 rewriter.visit(tree)
144
🟢 145 assert rewriter.next_probe_id == kcov._MAX_PROBES
🟢 146 assert len(rewriter.probe_map) == 1, f"Only 1 probe should fit, got {len(rewriter.probe_map)}"
🟢 147 assert len(w) == 1
🟢 148 assert "exceeded" in str(w[0].message).lower()
149 finally:
🟢 150 kcov._capacity_warning_emitted = old_warning_state
151
152
🟢 153 def test_ast_rewriter_deduplicates_same_line():
154 """Verify that two statements on the same source line get only one probe."""
🟢 155 from quadrants.lang._kernel_coverage import _CoverageASTRewriter
156
🟢 157 src = "def f():\n a = 1; b = 2\n"
🟢 158 tree = ast.parse(src)
🟢 159 rewriter = _CoverageASTRewriter(field_name="_qd_cov", filepath="test.py", start_lineno=1, probe_id_start=0)
🟢 160 rewriter.visit(tree)
161
🟢 162 abs_lines = [lineno for _, (_, lineno) in rewriter.probe_map.items()]
🟢 163 assert abs_lines.count(2) == 1, f"Line 2 should have exactly one probe, got {abs_lines.count(2)}"
164
165
🟢 166 def test_env_var_max_probes():
167 """Verify that QD_COVERAGE_MAX_PROBES env var is read at import time."""
🟢 168 import quadrants.lang._kernel_coverage as kcov
169
🟢 170 assert kcov._MAX_PROBES == int(os.environ.get("QD_COVERAGE_MAX_PROBES", "100000"))
171
172
🟢 173 def test_harvest_field_exception_path():
174 """Verify that _harvest_field handles to_numpy() failure gracefully."""
🟢 175 from unittest.mock import MagicMock
176
🟢 177 import quadrants.lang._kernel_coverage as kcov
178
🟢 179 old_field = kcov._cov_field
🟢 180 old_prog = kcov._cov_field_prog
🟢 181 old_map = kcov._probe_map.copy()
🟢 182 try:
🟢 183 mock_field = MagicMock()
🟢 184 mock_field.to_numpy.side_effect = RuntimeError("runtime destroyed")
🟢 185 kcov._cov_field = mock_field
🟢 186 kcov._cov_field_prog = object()
🟢 187 kcov._probe_map[999999] = ("fake.py", 1)
188
189 # Should not raise — the exception is caught and logged
🟢 190 kcov._harvest_field()
191
🟢 192 assert kcov._cov_field is None, "Field should be cleared after failure"
🟢 193 assert kcov._cov_field_prog is None, "Field prog should be cleared after failure"
194 finally:
🟢 195 kcov._cov_field = old_field
🟢 196 kcov._cov_field_prog = old_prog
🟢 197 kcov._probe_map = old_map
198
199
200 # ---------------------------------------------------------------------------
201 # End-to-end tests
202 # ---------------------------------------------------------------------------
203
204
🟢 205 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 206 def test_kernel_coverage_branches_e2e():
207 """Verify that only the taken branch has its probe fired."""
🟢 208 from quadrants.lang import _kernel_coverage
209
🟢 210 _kernel_coverage.ensure_field_allocated()
211
🟢 212 probe_count_before = _kernel_coverage._probe_counter
🟢 213 out = qd.field(dtype=qd.i32, shape=(1,))
214
🟢 215 @qd.kernel
🟢 216 def branching_kernel():
🟢 217 x = 10
🟢 218 if x > 5:
🟢 219 out[0] = 1
220 else:
🔴 221 out[0] = 2
222
🟢 223 branching_kernel()
224
🟢 225 assert out[0] == 1
226
🟢 227 cov_field = _kernel_coverage.get_field()
🟢 228 arr = cov_field.to_numpy()
229
🟢 230 probes_for_kernel = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
231
🟢 232 taken_probes = {pid for pid, loc in probes_for_kernel.items() if arr[pid] != 0}
🟢 233 not_taken_probes = {pid for pid, loc in probes_for_kernel.items() if arr[pid] == 0}
234
🟢 235 assert len(taken_probes) > 0, "At least some probes should have fired"
🟢 236 assert len(not_taken_probes) > 0, "The else branch should not have been reached"
237
238
🟢 239 @test_utils.test(arch=qd.gpu)
🟢 240 def test_kernel_coverage_simt_e2e():
241 """Verify coverage probes track branches with block.sync() and subgroup shuffle.
242
243 The if/else is based on a runtime value read from a field, so the compiler cannot constant-fold it away.
244 Only the taken branch's shuffle probe should fire.
245 """
🟢 246 from quadrants.lang import _kernel_coverage
🟢 247 from quadrants.lang.simt import subgroup
248
🟢 249 _kernel_coverage.ensure_field_allocated()
250
🟢 251 N = 64
🟢 252 probe_count_before = _kernel_coverage._probe_counter
🟢 253 flag = qd.field(dtype=qd.i32, shape=(1,))
🟢 254 a = qd.field(dtype=qd.i32, shape=(N,))
🟢 255 out = qd.field(dtype=qd.i32, shape=(N,))
256
🟢 257 flag[0] = 1 # runtime value: take the if-branch
258
🟢 259 @qd.kernel
🟢 260 def simt_kernel():
🟢 261 qd.loop_config(block_dim=N)
🟢 262 for i in range(N):
🟢 263 a[i] = i + 1
🟢 264 qd.simt.block.sync()
🟢 265 if flag[0] > 0:
🟢 266 val = subgroup.shuffle(a[i], qd.u32(0))
🟢 267 out[i] = val
268 else:
🔴 269 val = subgroup.shuffle(a[i], qd.u32(1))
🔴 270 out[i] = val + 100
271
🟢 272 simt_kernel()
273
🟢 274 for i in range(4):
🟢 275 assert out[i] == 1, f"Expected 1 at index {i}, got {out[i]}"
276
🟢 277 cov_field = _kernel_coverage.get_field()
🟢 278 arr = cov_field.to_numpy()
279
🟢 280 probes_for_kernel = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
281
🟢 282 fired = {pid for pid in probes_for_kernel if arr[pid] != 0}
🟢 283 not_fired = {pid for pid in probes_for_kernel if arr[pid] == 0}
🟢 284 assert len(fired) >= 4, f"Expected at least 4 probes to fire, got {len(fired)}"
🟢 285 assert len(not_fired) >= 2, "The else branch should not have been reached"
286
287
🟢 288 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 289 def test_kernel_coverage_survives_reinit():
290 """Verify that coverage data accumulated before qd.init() reset is preserved.
291
292 Runs a kernel, then resets via qd.reset()/qd.init() (which triggers the _hooked_clear harvest), runs another
293 kernel, harvests again, and checks that _accumulated_lines contains data from both sessions.
294 """
🟢 295 from quadrants.lang import _kernel_coverage, impl
296
🟢 297 current_arch = impl.get_runtime()._arch
🟢 298 _kernel_coverage.ensure_field_allocated()
299
🟢 300 probe_count_before = _kernel_coverage._probe_counter
🟢 301 out1 = qd.field(dtype=qd.i32, shape=(1,))
302
🟢 303 @qd.kernel
🟢 304 def kernel_before_reset():
🟢 305 out1[0] = 1
306
🟢 307 kernel_before_reset()
308
🟢 309 cov_field = _kernel_coverage.get_field()
🟢 310 assert cov_field is not None
🟢 311 arr = cov_field.to_numpy()
🟢 312 probes_first = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 313 fired_first = {pid for pid in probes_first if arr[pid] != 0}
🟢 314 assert len(fired_first) > 0, "Probes from first kernel should have fired"
315
316 # Don't call _harvest_field() manually — let qd.reset() trigger it via the _hooked_clear hook
🟢 317 qd.reset()
318
319 # Verify the hook harvested data from the first session
🟢 320 files_before = set(_kernel_coverage._accumulated_lines.keys())
🟢 321 assert len(files_before) > 0, "Hook should have harvested data during reset"
🟢 322 lines_before = {}
🟢 323 for f, lines in _kernel_coverage._accumulated_lines.items():
🟢 324 lines_before[f] = set(lines)
325
🟢 326 qd.init(arch=current_arch)
327
🟢 328 _kernel_coverage.ensure_field_allocated()
329
🟢 330 probe_count_mid = _kernel_coverage._probe_counter
🟢 331 out2 = qd.field(dtype=qd.i32, shape=(1,))
332
🟢 333 @qd.kernel
🟢 334 def kernel_after_reset():
🟢 335 out2[0] = 2
336
🟢 337 kernel_after_reset()
338
🟢 339 _kernel_coverage._harvest_field()
340
🟢 341 for f in files_before:
🟢 342 assert (
343 f in _kernel_coverage._accumulated_lines
344 ), f"File {f} from before reset should still be in _accumulated_lines"
🟢 345 assert lines_before[f].issubset(
346 _kernel_coverage._accumulated_lines[f]
347 ), "Lines from before reset should be preserved"
348
🟢 349 probes_second = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_mid}
🟢 350 second_files = {loc[0] for loc in probes_second.values()}
🟢 351 for f in second_files:
🟢 352 assert f in _kernel_coverage._accumulated_lines, f"File {f} from second kernel should be in _accumulated_lines"
353
354
🟢 355 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 356 def test_kernel_coverage_autodiff():
357 """Verify that autodiff forward pass produces probes but backward does not.
358
359 The forward compilation (AutodiffMode.NONE) should insert probes that fire. The backward compilation
360 (AutodiffMode.REVERSE) should not add any probes.
361 """
🟢 362 from quadrants.lang import _kernel_coverage
363
🟢 364 _kernel_coverage.ensure_field_allocated()
365
🟢 366 x = qd.field(dtype=qd.f32, shape=(), needs_grad=True)
🟢 367 loss = qd.field(dtype=qd.f32, shape=(), needs_grad=True)
368
🟢 369 @qd.kernel
🟢 370 def compute():
🟢 371 loss[None] = x[None] * x[None]
372
🟢 373 x[None] = 5.0
374
🟢 375 probe_count_before = _kernel_coverage._probe_counter
376
🟢 377 with qd.ad.Tape(loss):
🟢 378 compute()
379
🟢 380 probe_count_after_tape = _kernel_coverage._probe_counter
🟢 381 forward_probes = probe_count_after_tape - probe_count_before
🟢 382 assert forward_probes > 0, "Forward compilation should have inserted probes"
383
384 # Verify forward probes actually fired
🟢 385 cov_field = _kernel_coverage.get_field()
🟢 386 assert cov_field is not None
🟢 387 arr = cov_field.to_numpy()
🟢 388 probes = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 389 fired = {pid for pid in probes if arr[pid] != 0}
🟢 390 assert len(fired) > 0, "Forward pass inside Tape should produce fired coverage probes"
391
392 # Verify backward pass computes correct gradients
🟢 393 assert loss[None] == pytest.approx(25.0)
🟢 394 assert x.grad[None] == pytest.approx(10.0)
395
396
🟢 397 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 398 def test_kernel_coverage_qd_func():
399 """Verify that probes fire inside a @qd.func called from a kernel."""
🟢 400 from quadrants.lang import _kernel_coverage
401
🟢 402 _kernel_coverage.ensure_field_allocated()
403
🟢 404 probe_count_before = _kernel_coverage._probe_counter
🟢 405 out = qd.field(dtype=qd.i32, shape=(1,))
406
🟢 407 @qd.func
🟢 408 def helper():
🟢 409 out[0] = 99
410
🟢 411 @qd.kernel
🟢 412 def caller():
🟢 413 helper()
414
🟢 415 caller()
416
🟢 417 assert out[0] == 99
418
🟢 419 cov_field = _kernel_coverage.get_field()
🟢 420 assert cov_field is not None
🟢 421 arr = cov_field.to_numpy()
422
🟢 423 probes = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_before}
🟢 424 fired = {pid for pid in probes if arr[pid] != 0}
425 # The kernel body has one statement (helper()), and the func body has one (out[0] = 99).
426 # Both should produce probes that fire.
🟢 427 assert (
428 len(fired) >= 2
429 ), f"Expected probes from both kernel and func to fire, got {len(fired)} fired out of {len(probes)}"
430
431
🟢 432 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 433 def test_kernel_coverage_multiple_kernels_same_session():
434 """Verify that probes from two different kernels both fire in the same session."""
🟢 435 from quadrants.lang import _kernel_coverage
436
🟢 437 _kernel_coverage.ensure_field_allocated()
438
🟢 439 probe_count_before = _kernel_coverage._probe_counter
🟢 440 a = qd.field(dtype=qd.i32, shape=(1,))
🟢 441 b = qd.field(dtype=qd.i32, shape=(1,))
442
🟢 443 @qd.kernel
🟢 444 def kernel_a():
🟢 445 a[0] = 10
446
🟢 447 @qd.kernel
🟢 448 def kernel_b():
🟢 449 b[0] = 20
450
🟢 451 kernel_a()
🟢 452 probe_count_after_a = _kernel_coverage._probe_counter
🟢 453 kernel_b()
454
🟢 455 assert a[0] == 10
🟢 456 assert b[0] == 20
457
🟢 458 cov_field = _kernel_coverage.get_field()
🟢 459 arr = cov_field.to_numpy()
460
🟢 461 probes_a = {
462 pid: loc for pid, loc in _kernel_coverage._probe_map.items() if probe_count_before <= pid < probe_count_after_a
463 }
🟢 464 probes_b = {pid: loc for pid, loc in _kernel_coverage._probe_map.items() if pid >= probe_count_after_a}
465
🟢 466 fired_a = {pid for pid in probes_a if arr[pid] != 0}
🟢 467 fired_b = {pid for pid in probes_b if arr[pid] != 0}
468
🟢 469 assert len(fired_a) > 0, "Probes from kernel_a should have fired"
🟢 470 assert len(fired_b) > 0, "Probes from kernel_b should have fired"
471
472
🟢 473 @test_utils.test(arch=[qd.cpu, qd.cuda])
🟢 474 def test_qd_prefix_exemption_pure_kernel():
475 """Verify that _qd_-prefixed globals don't violate pure kernel checks.
476
477 With kernel coverage enabled, _qd_cov is injected as a global. This test verifies that a pure (fastcache)
478 kernel still compiles without error. The kernel uses ndarray arguments (not global fields) because pure
479 kernels prohibit non-_qd_ globals.
480 """
🟢 481 a = qd.ndarray(qd.i32, (1,))
482
🟢 483 @qd.kernel(fastcache=True)
🟢 484 def pure_kernel(arr: qd.types.NDArray) -> None:
🟢 485 arr[0] = 42
486
🟢 487 pure_kernel(a)
🟢 488 assert a[0] == 42
🟢 tests/python/test_offline_cache.py (100%)
16 # Coverage field allocation creates internal fill kernels that change cache file counts.
17 # CI runs these tests in a separate phase without QD_KERNEL_COVERAGE (see 4_test.sh).
🟢 18 pytestmark = pytest.mark.skipif(
19 os.environ.get("QD_KERNEL_COVERAGE") == "1",
20 reason="Kernel coverage adds internal kernels that invalidate cache file count assertions",
21 )
22
🟢 tests/python/test_snode_layout_inspection.py (100%)
🟢 1 import os
2
🟢 3 import pytest
4
🟢 10 @pytest.mark.skipif(
11 os.environ.get("QD_KERNEL_COVERAGE") == "1",
12 reason="Kernel coverage field on root shifts offset assertions",
13 )
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 0% |
| Overall project coverage | 73% |
Total: 0 lines, 0 missing, 0% covered
* [Misc] Warn user to disable caching when print_ir/QD_DUMP_IR enabled (Genesis-Embodied-AI#425) Co-authored-by: v01dxyz <v01dxyz@v01d.xyz> * [Build] Pin torch version to CUDA 12.8 for CUDA tests (Genesis-Embodied-AI#428) * [Misc] Fixing up taichi-dev urls (Genesis-Embodied-AI#429) * [Perf] Rename cuda_graph to gpu_graph across the codebase (Genesis-Embodied-AI#430) * Misc: fix typo integeral -> integral (Genesis-Embodied-AI#434) Co-authored-by: v01dxyz <v01dxyz@v01d.xyz> * [Perf] CUDA graph 4: call from multiple locations (Genesis-Embodied-AI#420) * [Bug] Fix fastcache not restoring graph_do_while_arg (Genesis-Embodied-AI#435) * [Perf] Cache last-call result in perf_dispatch for single-compatible case (Genesis-Embodied-AI#438) * Fix gpu_graph fallback on old Nvidia GPU. (Genesis-Embodied-AI#443) * Fix shared memory offset not reset between CUDA kernels. (Genesis-Embodied-AI#442) * [Misc] Allow disabling GPU graph via QD_GPU_GRAPH=0 env var (Genesis-Embodied-AI#439) * [Misc] Add named top-level loops (Genesis-Embodied-AI#440) * [Misc] Rename gpu_graph to graph (Genesis-Embodied-AI#446) * [Misc] Add cross-platform shuffle (Genesis-Embodied-AI#447) * [Bug] Fix graph_do_while on Windows: search for cudadevrt.lib (Genesis-Embodied-AI#456) * [Bug] Also search default CUDA toolkit install location on Windows (Genesis-Embodied-AI#461) * [SPIRV] Feature Parity Atomics & Shared Array (Genesis-Embodied-AI#432) * [Misc] Change clang format to 120 characters (Genesis-Embodied-AI#463) * [Misc] CUDA graph 5 Add fatbin (Genesis-Embodied-AI#464) * [Bug] Reuse VkInstance across init/reset cycles (Genesis-Embodied-AI#465) * [Perf] Tiles 1: _load, _store, _eye_ (Genesis-Embodied-AI#466) * [Misc] Remove dead InternalFuncStmt type_check override (Genesis-Embodied-AI#471) * [Perf] Tiles 2: add cholesky and ger (Genesis-Embodied-AI#472) * [Perf] Tiles 2b: add triangular solve (Genesis-Embodied-AI#474) * [Misc] Refactor: use _get_col/_set_col in tiles load/store/init (Genesis-Embodied-AI#475) * [Build] Fix flaky test_clock_accuracy (Genesis-Embodied-AI#436) * Fix AARCH64 emitting invalid asm in CUDA kernels. (Genesis-Embodied-AI#473) Co-authored-by: Hugh Perkins <hughperkins@gmail.com> * [AMDGPU] Enable HIP memory pool and surface pool-exhaustion errors. (Genesis-Embodied-AI#485) * [AMDGPU] Scope hsaco tmp dir per-user to avoid collisions. (Genesis-Embodied-AI#484) * [Perf] Tiles 3: Add slice syntax, qd.outer() and initial doc (Genesis-Embodied-AI#477) * [AMDGPU] Fix gradient computation. (Genesis-Embodied-AI#486) * Enable all backends that are supported in unit tests. (Genesis-Embodied-AI#488) * Fix SPIRV ID overflow for large kernels due to autodiff. (Genesis-Embodied-AI#489) * [Misc] Fix purity checker to allow accessing constants from quadrants modules (Genesis-Embodied-AI#487) * [Misc] Increase tolerance for clock monotonic test (Genesis-Embodied-AI#492) * [CI] Serialize api doc workflow (Genesis-Embodied-AI#494) * [CI] Increase tolerance for clock test (Genesis-Embodied-AI#506) * [CI] Increase clock test tolerance to 20% (Genesis-Embodied-AI#509) * [Perf] Add tensor_type parametrization to tile16 tests (Genesis-Embodied-AI#504) * [Perf] Tiles 4b: Migrate tiles16 tests to enable fastcache (Genesis-Embodied-AI#505) * [Perf] Tiles 4c: add Tiles16x16 proxy (Genesis-Embodied-AI#507) * [Perf] Tiles 4d: Consolidate slice error tests using parametrize (Genesis-Embodied-AI#508) * [Perf] Tiles 4: add SharedArray slice support (Genesis-Embodied-AI#482) * [Perf] Tiles 5: add Cholesky benchmark demo (Genesis-Embodied-AI#483) * [Doc] Add user guide page for subgroup shuffle (Genesis-Embodied-AI#512) * [Perf] Implement cross-platform shuffle_down (Genesis-Embodied-AI#510) * [Perf] Add portable subgroup reduce_add and reduce_all_add (Genesis-Embodied-AI#511) * [Perf] Add first warmup config to perf dispatch (Genesis-Embodied-AI#422) * [AutoDiff] Autodiff 1: Add baseline adstack regression test for unary_collections (Genesis-Embodied-AI#500) * [AutoDiff] Autodiff 2: Implement derivative for tan (Genesis-Embodied-AI#501) * [AutoDiff] Autodiff 3: Recompute tanh/exp on the operand in the reverse pass (Genesis-Embodied-AI#502) * [AutoDiff] Autodiff 4: Mark rsqrt as non-linear for adstack promotion (Genesis-Embodied-AI#503) * [AutoDiff] Autodiff 5: Fix adjoint-alloca placement for GlobalLoads outside the current range-for (Genesis-Embodied-AI#496) * [AutoDiff] Autodiff 6: Adstack regression tests (Genesis-Embodied-AI#491) * [AutoDiff] Autodiff 7: Fix header size in AdStackAllocaStmt to match u64 runtime layout (Genesis-Embodied-AI#534) * [AutoDiff] Autodiff 8: Surface LLVM adstack push/pop overflow as a Python exception (Genesis-Embodied-AI#535) * [AutoDiff] Autodiff 9: Guard against LLVM worker-thread stack overflow from large per-task adstack budget (Genesis-Embodied-AI#495) * [AutoDiff] Autodiff 10: Implement adstack for SPIR-V (Genesis-Embodied-AI#490) * [AutoDiff] Autodiff 11: Latent adstack-adjacent fixes (AMDGPU hipFree, flush() keeps ctx_buffers_, always-preallocate) (Genesis-Embodied-AI#536) * [Doc] Add AGENTS.md with instructions for AI agents (Genesis-Embodied-AI#541) * [Bug] Abort kernel execution on assertion failure instead of segfaulting (Genesis-Embodied-AI#419) * [Type] ndarray typing 1: Add eval_str=True to inspect.signature() calls (Genesis-Embodied-AI#411) * [CI] Suppress reportPrivateImportUsage in torch-using files (Genesis-Embodied-AI#552) * [Misc] QD_DUMP_IR dumps to files with the task_id added to the filename (Genesis-Embodied-AI#441) * [Type] ndarray typing 2: Fix NDArray single-arg subscript crash (Genesis-Embodied-AI#412) * [Test] Flush xdist channel before worker exit so test failure reports are visible (Genesis-Embodied-AI#555) * [CI] Reduce test retries on CI from 3 to 1. (Genesis-Embodied-AI#554) * [AutoDiff] Autodiff 12: Heap-backed adstack on LLVM backends (CPU/CUDA/AMDGPU) (Genesis-Embodied-AI#537) * [AutoDiff] Autodiff 13: Heap-backed adstack on SPIR-V backends (Metal, Vulkan) (Genesis-Embodied-AI#493) * [AutoDiff] Autodiff 14: Resolve bounded-inner-loop adstacks without default_ad_stack_size fallback (Genesis-Embodied-AI#539) * [SPIRV] Vulkan SPIR-V correctness: atomic-view aliasing, PSB stride, narrow storage caps, u1 cast, per-init layer recheck (Genesis-Embodied-AI#513) * [Build] Autodiff 15: Replace 2022 MoltenVK pin with LunarG Vulkan SDK fetch and sanitise MoltenVK cap advertisement (Genesis-Embodied-AI#551) * [Test] Suppress stock pytest-timeout to avoid conflict with pytest_hardtle (Genesis-Embodied-AI#557) * [Vulkan] Use SDK validation layer for debugPrintf instead of apt package (Genesis-Embodied-AI#562) * [Test] Fix flaky perf_dispatch tests by increasing work amounts (Genesis-Embodied-AI#559) * [Test] Add --maxfail CLI option to run_tests.py (default 20) (Genesis-Embodied-AI#558) * [CI] Vulkan debug printf fix to address flaky tests (Genesis-Embodied-AI#563) * [Docs] Add a new page to help for first time contributors (Genesis-Embodied-AI#426) Authored-by: v01dxyz <v01dxyz@v01d.xyz> * [AutoDiff] Autodiff 16: Resolve reverse-mode adstack depths per-launch via runtime-evaluated SizeExpr (Genesis-Embodied-AI#543) * Fix: raise error if device memory allocation fails (Genesis-Embodied-AI#451) (Genesis-Embodied-AI#453) Co-authored-by: v01dxyz <v01dxyz@v01d.xyz> Co-authored-by: Hugh Perkins <hughperkins@gmail.com> * [CI] Add CI job to check line wrapping of comments and docs (Genesis-Embodied-AI#564) * [Misc] Add coverage report to PRs, including kernels (Genesis-Embodied-AI#470) * [CI] CI wrap check feeds only diffs to agent (Genesis-Embodied-AI#567) * Skip 'flaky' test on MacOS CI. (Genesis-Embodied-AI#573) * [Test] Fix missing `import sys` in test_fail_device_memory_allocation (Genesis-Embodied-AI#574) * [CI] Fix Vulkan debugPrintf flake with session-scoped warmup (Genesis-Embodied-AI#571) * [AutoDiff] determine_ad_stack_size: replace whole-CFG Bellman-Ford with SCC + DAG DP (Genesis-Embodied-AI#575) * [Test] Fix macOS OOM skip reason to describe actual root cause (Genesis-Embodied-AI#576) * [Lang] whole_kernel_cse: 2.5x compile time speedup on large kernels (Genesis-Embodied-AI#577) * [CI] Add CI check for unnecessarily deleted comments (Genesis-Embodied-AI#570) * [CI] Migrate coverage report to github Check page (Genesis-Embodied-AI#566) * [Lang] Skip IR verifier between passes unless debug=true (Genesis-Embodied-AI#579) * [Lang] Inline AdStack ops on release LLVM codegen: dramatically reduces compile time for adstack-enabled reverse-mode kernels (Genesis-Embodied-AI#584) * [CUDA] Honor offline_cache=False end-to-end so QD_OFFLINE_CACHE=0 actually gives a cold compile (Genesis-Embodied-AI#580) * [Type] Tensor 24 (Genesis-Embodied-AI#561) Co-authored-by: hugh <hugh@slurm-login-0.slurm-login.tenant-slurm.svc.cluster.local> * [Lang] auto_diff host-walk reductions: dramatically faster front-end compile time on adstack-enabled reverse-mode kernels (Genesis-Embodied-AI#587) * [AutoDiff] Speed up reverse-mode kernel launches on GPU backends (Genesis-Embodied-AI#578) * [Vulkan] Move adstack-sizer scratch out of Function-scope memory to fix SPIR-V pipeline build failures (Genesis-Embodied-AI#588) * [AutoDiff] Improve diagnosis of unsupported reverse-mode AD patterns (Genesis-Embodied-AI#590) * [Bug] Fix: promote Ndarray to AnyArray in build_Name for flattened struct fields (Genesis-Embodied-AI#592) * [SPIR-V] Shrink reverse-grad kernel MSL by ~50% (Genesis-Embodied-AI#591) * [CI] Add CI check that PR changes have test coverage (Genesis-Embodied-AI#596) * [Perf] Enable zero-copy in to_torch() and to_numpy() (Genesis-Embodied-AI#450) * Add BufferView: safe sub-range ndarray access for kernels (Genesis-Embodied-AI#585) Co-authored-by: alanray-tech <alanray-tech@users.noreply.github.com> Co-authored-by: Hugh Perkins <hughperkins@gmail.com> * [Doc] Add user-facing fastcache documentation (Genesis-Embodied-AI#597) Co-authored-by: hugh <hugh@slurm-login-0.slurm-login.tenant-slurm.svc.cluster.local> * [Misc] Upgrade to enable v1 dlpack so to_numpy(copy=False) writable (Genesis-Embodied-AI#598) Co-authored-by: root <root@rtx-209-201.slurm-compute.tenant-slurm.svc.cluster.local> * [AutoDiff] Cut reverse-mode adstack memory usage 10x on all backends (Genesis-Embodied-AI#599) * [Misc] Add CI check for feature file factorization (Genesis-Embodied-AI#606) * [Perf] Skip _recursive_set_args for all-Field frozen dataclass structs (Genesis-Embodied-AI#607) Co-authored-by: Cursor <cursoragent@cursor.com> * [AutoDiff] SNode-arm bound-expr capture rejects fold-attack gate indices (Genesis-Embodied-AI#610) * [Misc] Suppress field fastcache warning for qd.Tensor (Genesis-Embodied-AI#615) Co-authored-by: Cursor <cursoragent@cursor.com> * [AutoDiff] Adstack heap: clip reducer count by per-task loop trip count (compile-time and SizeExpr-evaluated) (Genesis-Embodied-AI#611) * [Misc] Forward copy= through qd.Tensor, add copy=None option (Genesis-Embodied-AI#616) Co-authored-by: Cursor <cursoragent@cursor.com> * [Doc] Update README (Genesis-Embodied-AI#617) Co-authored-by: Cursor <cursoragent@cursor.com> * [CI] Fix coverage report showing def lines as uncovered (Genesis-Embodied-AI#623) Co-authored-by: Cursor <cursoragent@cursor.com> * [Perf] Generic launcher: persistent context, JIT-pointer reuse, Metal compute encoder, LLVM-GPU async memory ops (Part 1/2) (Genesis-Embodied-AI#619) * [CI] Encode Python-first testing policy in coverage-check prompt (Genesis-Embodied-AI#622) Co-authored-by: Cursor <cursoragent@cursor.com> * [CI] Add PR Line change report (Genesis-Embodied-AI#624) Co-authored-by: Cursor <cursoragent@cursor.com> * [CI] Disable quadrants pytest plugin during quadrants internal coverage runs (Genesis-Embodied-AI#629) Co-authored-by: Cursor <cursoragent@cursor.com> * [AutoDiff] Adstack load+store eliminations: EliminateRecomputableAdStackPushes pass + leaf extensions (Genesis-Embodied-AI#621) * [CI] Simplify coverage PR comment to a single linked line (Genesis-Embodied-AI#630) * [CUDA] Add AGX Thor, SM_110 (Genesis-Embodied-AI#631) Co-authored-by: Johnny Nunez and Hugh Perkins * [CI] Lines changed report: collapse PR comment to a single linked totals line (Genesis-Embodied-AI#632) * [FEATURE] Support external Metal command queue via qd.init (Genesis-Embodied-AI#618) Co-authored-by: Cursor <cursoragent@cursor.com> * [Perf] Cache adstack-sizer metadata per task across SPIR-V + LLVM-GPU; per-snode / DeviceAllocation invalidation (Part 2/2) (Genesis-Embodied-AI#620) * [AutoDiff] Disable EliminateRecomputableAdStackPushes pending mutated-SNode chain-leaf fix (Genesis-Embodied-AI#633) * [AutoDiff] Adstack chain-clone safety: mutated-SNode leaf reject + load_top consumer-aware guard (Genesis-Embodied-AI#634) * [Docs] Add user-guide page for qd.simt.block.* primitives (Genesis-Embodied-AI#638) * [Docs] Expand qd.simt.subgroup user-guide page to cover every op (Genesis-Embodied-AI#639) * [Perf] Streams 1-4 (Genesis-Embodied-AI#410) * [Docs] Add user-guide page for matrix decompositions and solvers (Genesis-Embodied-AI#643) * [Bug] Revert "[Perf] Streams 1-4 (Genesis-Embodied-AI#410)" (Genesis-Embodied-AI#650) * [Docs] Add user-guide page for atomics and bit operations (Genesis-Embodied-AI#640) * [Docs] Add user-guide page for qd.simt.grid.* primitives (Genesis-Embodied-AI#641) * [AutoDiff] Adstack max-reducer: parallel multi-axis MaxOverRange dispatch (Genesis-Embodied-AI#635) * [AMDGPU] Fix amdgpu parallel rand init (Genesis-Embodied-AI#658) * [Perf] Adstack: skip max-reducer recognizer on CPU + lift host-eval cap (Genesis-Embodied-AI#655) * [Perf] Re-land Streams 1-4 with bug fixes (Genesis-Embodied-AI#653) * [AMDGPU] Apply device_memory_GB=0.3 cap to AMDGPU tests (Genesis-Embodied-AI#659) * [Perf] Per-launch host sync: drop wait_idle on SPIR-V, pin stream and drop stream_synchronize on CUDA/AMDGPU (Genesis-Embodied-AI#654) * [AMDGPU] Unload hipModule_t in JITModuleAMDGPU destructor (Genesis-Embodied-AI#660) * [AMDGPU] Trim default mempool on qd.reset() (Genesis-Embodied-AI#669) * [AMDGPU] Hoist rand-state buffer to process lifetime (Genesis-Embodied-AI#668) * [Streams] Use events for streams serialization on AMDGPU and CUDA (Genesis-Embodied-AI#667) * [Perf] Adstack max-reducer: launch cache + zero-copy result map; content-stable registry_id (Genesis-Embodied-AI#671) * [SPIR-V] dispatch_max_reducers: register each task with the real kernel name (Genesis-Embodied-AI#675) * [AutoDiff] Debug-mode field/grad/dual: dtype, layout, and access-time invariants (Genesis-Embodied-AI#677) * [Docs] Add user-guide page for qd.algorithms.* device-wide algorithms (Genesis-Embodied-AI#642) Co-authored-by: alanray-tech <alan.ray@genesis-ai.company> * [Docs] Doc for existing atomics: switch support table to per-backend columns (Genesis-Embodied-AI#657) Co-authored-by: alanray-tech <alan.ray@genesis-ai.company> * [GPU] Cross gpu atomics (Genesis-Embodied-AI#666) Co-authored-by: alanray-tech <alan.ray@genesis-ai.company> * [GPU] Make block operations portable cross-gpu (Genesis-Embodied-AI#664) * [Perf] CPU LLVM adstack-cache: skip per-launch bump-writes + ndarray_shapes capture on forward-only handles (Genesis-Embodied-AI#685) * [GPU] Cross-GPU for grid ops (Genesis-Embodied-AI#670) * [Math] Make bitop operations portable cross-gpu (Genesis-Embodied-AI#662) * [AMDGPU] Always use wave64, on both RDNA and CDNA (Genesis-Embodied-AI#687) * [AMDGPU] Use syncscope("agent") for atomix xor to avoid CAS livelock (Genesis-Embodied-AI#672) * [GPU] New bit ops for QIPC (Genesis-Embodied-AI#679) * [GPU] Subgroup ops cross-gpu (Genesis-Embodied-AI#665) * [Graph] Rename CUDA Graph to Graph in docs (Genesis-Embodied-AI#691) * [SPIR-V] Fix FIFO-queue ordering when sharing command queue. (Genesis-Embodied-AI#694) * [Atomics] New QIPC ops for atomics (Genesis-Embodied-AI#690) * Pass dataclass sub-structs into qd.func (Genesis-Embodied-AI#698) * [AMDGPU] HIP graph runtime support for @qd.kernel(graph=True) (Genesis-Embodied-AI#692) * [CI] Add per-file timing report to Mac Metal test job (Genesis-Embodied-AI#695) Co-authored-by: Cursor <cursoragent@cursor.com> * [CI] Enable kernel disk cache during tests (Genesis-Embodied-AI#696) * [Math] New QIPC ops for single-threaded linalg (Genesis-Embodied-AI#683) * [BREAKING][GPU] New QIPC ops for subgroups (Genesis-Embodied-AI#676) * [GPU] New QIPC ops for block (Genesis-Embodied-AI#684) * [GPU] New device-level ops for QIPC (Genesis-Embodied-AI#693) * [algorithms] PrefixSumExecutor: drop unused GRID_SZ local (Genesis-Embodied-AI#701) * [block] sync(): fix unsupported-arch error message (Genesis-Embodied-AI#700) * [volatile_load] add qd.volatile_load primitive (closes Genesis-Embodied-AI#648) (Genesis-Embodied-AI#702) * [AutoDiff] Reject recycled identity_key in AdStackCache::register_adstack_sizing_info (Genesis-Embodied-AI#708) * [Vulkan] Declare GroupNonUniform SPIR-V caps and enable shaderSubgroupExtendedTypes (Genesis-Embodied-AI#707) * Fix duplicate HIP graph driver-function declarations after v1.0.0 merge The amd-integration fork had cherry-picked the HIP graph driver functions (graph_create / graph_destroy / graph_add_kernel_node / graph_instantiate / graph_exec_destroy / graph_launch), and upstream v1.0.0 added the same set. The per-file 3-way merge appended both copies into amdgpu_driver_functions.inc.h, producing redeclaration errors that broke the AMDGPU RHI/runtime compile. Drop the upstream duplicate block; the signatures are identical to the fork's existing declarations. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix AMDGPU launcher coherence and num_instructions visibility after v1.0.0 merge - kernel_launcher.cpp: the 3-way merge spliced upstream v1.0.0's launch_llvm_kernel rewrite (ephemeral arg/context buffers, explicit-stream path, AmdgpuDefaultStream PinGuard) onto the AMD fork's kernarg-by-value + persistent-scratch design, leaving references to undefined `ephemeral_context_ptr`. Restore the fork's coherent launch_llvm_kernel verbatim; it calls the (already merged) enhanced launch_offloaded_tasks, which keeps the max-reducer dispatch and stream-parallel groups adapted onto the AMD launch path. - llvm_context.h: both the fork and upstream added `num_instructions`; the merge kept upstream's private placement, but the AMDGPU codegen force-inline heuristic calls it statically from outside the class. Move it back to the public section. Co-authored-by: Cursor <cursoragent@cursor.com> * Restore async result D2H and hoist kernarg vectors in AMDGPU launcher The v1.0.0 merge resolution regressed two amd-integration baseline optimizations in launch_llvm_kernel / launch_offloaded_tasks: - The per-launch result-buffer copy was a blocking memcpy_device_to_host, forcing a host stall on every value-returning launch and serializing the GPU pipeline. Restore the async D2H (the caller synchronizes lazily when it needs the value); external-array transfers still stream_synchronize once before reading back. - launch_task constructed the kernarg std::vectors from initializer lists ({kernarg_payload} / {kernarg_size}) on every dispatch (heap alloc + free per launch). Hoist arg_ptrs/arg_sizes out of the per-task launch and reuse. Co-authored-by: Cursor <cursoragent@cursor.com> * amdgpu: default to LDS permlane64 emulation; drop host-x86 barrier asm on retarget Two AMDGPU JIT-compile crashes surfaced after the v1.0.0 merge pulled in the QIPC subgroup ops (Genesis-Embodied-AI#676), which made the rigid constraint solver's wave-cooperative reductions route through `amdgpu_cross_half_shuffle_i32`. Both manifested as a SIGSEGV inside `llvm::SIInstrInfo::getInstSizeInBytes` during `JITSessionAMDGPU::compile_module_to_hsaco` (i.e. at first kernel launch), and reproduce on gfx942 / MI300X. Baseline 0.4.6 never emitted these constructs, which is why it was unaffected. 1. Native `llvm.amdgcn.permlane64` lowering crashes the bundled LLVM 22.1.0 AMDGPU backend. Default `amdgpu_permlane64` to the existing LDS-roundtrip software emulation on every target (it produces identical results). Add `QD_AMDGPU_USE_NATIVE_PERMLANE64=1` to opt back into the native instruction once the backend bug is fixed; the old `QD_AMDGPU_FORCE_PERMLANE64_FALLBACK` is now the default and still honored. This is the actual crash fix. 2. The runtime module is compiled by the host x86_64 clang and only retargeted to amdgcn here, so `amdgpu_cross_half_shuffle_i32`'s `__asm__ volatile("" : "+v"(byte))` optimization barrier carries x86 flag clobbers (`~{dirflag},~{fpsr},~{flags}`) that are meaningless on AMDGPU. The IR verifies but the empty-body INLINEASM is invalid on the amdgcn target. Neutralize empty-body barrier asm during retarget (forward the tied value, then erase) so no stale host asm reaches codegen. On the wave64 targets we ship `ds_bpermute` already addresses the full wave, so the hint is a no-op. Co-authored-by: Cursor <cursoragent@cursor.com> * style: apply clang-format (v19.1.7) to AMDGPU fn_attrs and launcher sources CI pre-commit's clang-format hook reformatted these files (long declarations/lambda signatures collapsed onto single lines per the repo's clang-format config). Apply the same formatting so the hook passes. No functional changes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(amdgpu): use CreateNeg for branchless i32 sgn instead of CreateSub(0, input) clang-tidy (modernize-use-nullptr, -warnings-as-errors) flagged `builder->CreateSub(0, input)` in the i32 sgn path: the literal `0` binds to the `llvm::Value*` LHS parameter as a null pointer, not an integer zero. Replace with `builder->CreateNeg(input)`, which emits `0 - input` with a proper zero constant -- identical intended semantics, and clang-tidy clean. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Robert Dazi <14996868+v01dXYZ@users.noreply.github.com> Co-authored-by: v01dxyz <v01dxyz@v01d.xyz> Co-authored-by: Hugh Perkins <hughperkins@gmail.com> Co-authored-by: Alexis DUBURCQ <alexis.duburcq@gmail.com> Co-authored-by: hugh <hugh@slurm-login-0.slurm-login.tenant-slurm.svc.cluster.local> Co-authored-by: alanray-tech <alan.ray@genesis-ai.company> Co-authored-by: alanray-tech <alanray-tech@users.noreply.github.com> Co-authored-by: root <root@rtx-209-201.slurm-compute.tenant-slurm.svc.cluster.local> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Johnny <johnnynuca14@gmail.com>
Issue: #
Brief Summary
Summary
This PR adds kernel code coverage to quadrants — tracking which lines and branches actually
execute inside compiled kernels on GPU/CPU, not just host-side Python code. The coverage data is
written in standard
coverage.pyformat, so it integrates withpytest-cov,diff-cover, andIDE coverage viewers out of the box.
What's included
Core feature (
_kernel_coverage.py, 275 lines):@qd.kerneland@qd.funcat eachsource line. Probes inside
if/elsebodies give true runtime branch coverage.QD_KERNEL_COVERAGE=1.qd.init()resets by harvesting probe data before runtime destruction.qd.init().Pytest plugin (
pytest_plugin.py):pytest11entry point. Automatically setsQD_KERNEL_COVERAGE=1whenpytest-covis active. Users can opt out withQD_KERNEL_COVERAGE=0.CI integration (
linux.yml,4_test.sh,4_test_cuda.sh):Coverage report tool (
coverage_report.py, 458 lines):with collapsible annotated code sections.
Tests (
test_kernel_coverage.py, 488 lines):if/else/while/with/try,@qd.func,multi-kernel, deduplication, reinit survival, autodiff exclusion, env var control,
probe capacity overflow.
Docs (
kernel_coverage.md):interaction, and advanced configuration.
Strengths
.coveragefiles, so the entire Python coverage ecosystem(pytest-cov, diff-cover, coverage.py, IDE integrations) works without any adapter.
without learning about
QD_KERNEL_COVERAGE.if/elseactually ran on the device, not justwhich lines were compiled.
xdist, concurrent kernels).
Weaknesses / known limitations
validation=True: kernel calls insideqd.ad.Tape(validation=True)are notcovered, because the AD compilation mode skips probe insertion.
misses on first run with coverage enabled. Expected but worth knowing.
many kernels may need to increase
QD_COVERAGE_MAX_PROBES.transitions between them, so arc data uses entry/exit arcs per line rather than true
control-flow arcs. This is sufficient for
coverage reportanddiff-coverbut won'tgive precise branch-pair information.
squash or interactive rebase before merge.
=====================
Here's the PR story:
────────────────────────────────────────
The problem
When you write a quadrants kernel and run your tests with pytest --cov, coverage.py measures which Python lines executed — but it has no visibility into what happens inside
@qd.kernel or @qd.func bodies. Those are compiled to GPU/CPU code and run on the device. From coverage.py's perspective, the kernel call site is one line, and it either ran or
didn't. You can't tell which branches inside the kernel were taken, or whether a particular code path in your kernel is dead.
This PR adds kernel code coverage: tracking which source lines actually execute inside compiled kernels, and feeding that data back into the standard coverage.py ecosystem so
it shows up in coverage report, coverage html, PR comments, etc.
The approach
The key insight is that quadrants already compiles kernels from Python AST → its own IR → device code. We can intercept at the AST stage and insert extra statements — probes —
before each source line. Each probe is just a field store: _qd_cov[probe_id] = 1. The existing compilation pipeline treats this as an ordinary integer write, so it compiles
and runs on the device alongside the user's code with no special C++ support needed.
After the kernel runs, we read the field back to the host. Any probe_id slot that contains 1 tells us that line executed.
The implementation
Enabling: environment variable gate
Everything is gated on QD_KERNEL_COVERAGE=1. Both _func_base.py and kernel.py read this at module load time into a _KERNEL_COVERAGE bool. When it's False, the coverage module
is never imported and there's zero overhead.
_KERNEL_COVERAGE = os.environ.get("QD_KERNEL_COVERAGE") == "1"
The AST rewriter (
_kernel_coverage.py)The core of the feature is _CoverageASTRewriter, an ast.NodeTransformer. It walks a kernel's AST and, for each statement at a new source line, prepends a probe node:
_qd_cov[<probe_id>] = 1
It has visit_* methods for every compound statement type (if/else, for, while, with, try/except/finally) so that probes land inside each branch body. This means if you have an
if/else, the if-body probe only fires when that branch is taken — giving true runtime branch coverage, not just line coverage.
Each probe gets a unique integer ID, and the rewriter records a mapping: {probe_id: (filepath, lineno)}. A _seen_lines set prevents duplicate probes when two statements share
a source line (e.g. a = 1; b = 2).
There's a configurable capacity limit (QD_COVERAGE_MAX_PROBES, default 100k) — if exceeded, a warning is emitted and additional lines are silently skipped.
Hooking into compilation (
_func_base.py)When a kernel is about to be compiled, FuncBase._compile_function runs. This is where the AST gets transformed before being handed to the quadrants compiler. The new code
checks two conditions:
If both are true, it calls _kernel_coverage.rewrite_ast(tree, filepath, start_lineno) to inject probes into the AST. It then retrieves the coverage field via get_field() and
adds it to the kernel's global_vars dict under the name _qd_cov, so the compiled probe statements can resolve the field reference.
The AD exclusion is important: the backward pass is an auto-generated transformation of the kernel. Inserting field stores into it would break gradient computation (they'd be
treated as differentiable operations). Since the backward pass replays the same control flow as the forward pass, there are no user-written lines that appear only in the
backward pass, so nothing is lost.
This also applies to @qd.func bodies — they go through the same _compile_function path and receive probes when called from a non-AD kernel.
Allocating the probe field (
kernel.py)Before compilation, Kernel.materialize_layout calls _kernel_coverage.ensure_field_allocated(). This creates a single global qd.field(dtype=qd.i32, shape=(_MAX_PROBES,)) — a flat
integer array on the device. All probes across all kernels in the process write into this one field, indexed by their probe ID.
The function uses double-checked locking: it checks whether the field already exists (and belongs to the current Program instance), and only allocates under a lock if needed.
This handles the case where qd.init() is called again mid-process, creating a new Program — the field must be re-allocated against the new runtime.
Surviving runtime resets
Users can call qd.reset() / qd.init() to tear down and recreate the runtime. This destroys all fields, including our coverage field. If we don't harvest the probe data before
that happens, it's lost.
The fix is _install_reset_hook(), which monkey-patches PyQuadrants.clear() (the method that destroys the runtime) to call _harvest_field() first. _harvest_field() reads the field
back to numpy, iterates the probe map, and records which lines fired into _accumulated_lines: dict[str, set[int]] — a durable Python dict that survives across runtime resets.
Writing output (
flush)flush() is registered with atexit. It calls _harvest_field() one last time (for the current session), then writes _accumulated_lines to a _qd_kcov. file using coverage.py's
CoverageData API.
It detects whether pytest-cov is using branch (arc) mode by reading the .coverage file. If so, it emits entry/exit arcs (-1, line) and (line, -1) for each covered line. These are
minimal arcs that correctly represent "this line ran" without fabricating transitions between lines that may not reflect actual control flow. If pytest-cov is in line mode, it
writes line data directly.
The PID-based filename ensures parallel pytest-xdist workers don't collide.
Pure kernel exemption (
ast_transformer_utils.py)Quadrants has a "pure kernel" concept (@qd.kernel(fastcache=True)) where the compiler verifies the kernel doesn't access mutable global state. Our _qd_cov field is a global, so
without an exemption, every pure kernel would fail to compile with coverage enabled.
The fix generalizes an existing check: any global variable whose name starts with qd is exempted from the purity violation. This is a one-line change from if name != "_qd_cov"
to if not name.startswith("qd"), keeping it open for future internal globals.
CI integration (
linux.yml,4_test.sh,4_test_cuda.sh)The CI workflow runs tests in three stages:
against origin/main, and posts it as a PR comment via gh pr comment.
A few existing tests are incompatible with coverage probes (they assert on compiled kernel identity, cache file counts, or thread-safety of add_struct_module). These get
pytest.mark.skipif(QD_KERNEL_COVERAGE) markers and still run in the non-coverage test phases.
Coverage report tool (
coverage_report.py)A standalone script used by both CI and local development. It has two responsibilities:
Four output formats are supported via a renderer class hierarchy (_TerminalRenderer, _AnnotatedRenderer, _MarkdownRenderer, _HtmlRenderer), all driven by a single loop in
generate_report that calls begin / begin_file / write_line / end_file / finish.
Documentation (
kernel_coverage.md)A user guide aimed at people using the quadrants library (e.g., Genesis developers), not quadrants framework developers. It covers: enabling coverage, viewing results with
coverage.py and pytest-cov, an example with branch coverage, key properties, probe capacity configuration, autodiff interaction (what is and isn't covered), and a brief "under
the hood" section.
Tests (
test_kernel_coverage.py)The test suite has two layers:
AST rewriter unit tests — parametrized over 6 control-flow constructs (straight-line, if/else, for, while, with, try/except/finally), plus dedicated tests for capacity limiting
and same-line deduplication. These are pure Python and don't need a runtime.
End-to-end tests — run actual kernels on CPU and CUDA and verify probes fired:
• Branch selectivity (taken branch fires, untaken doesn't)
• SIMT with block.sync() and subgroup shuffle (GPU only)
• qd.reset() / qd.init() survival
• Autodiff: forward pass produces probes, backward doesn't add any
• @qd.func called from a kernel
• Multiple kernels accumulating into the same field
• Pure (fastcache) kernel compiles without errors
• _harvest_field graceful failure when to_numpy() throws
copilot:summary
Walkthrough
copilot:walkthrough