Skip to content

feat: extract rationale comments + ADR/RFC doc references from JS/TS#1599

Open
niltonmourafilho-arch wants to merge 1 commit into
Graphify-Labs:v8from
niltonmourafilho-arch:feat/js-ts-rationale-docref-extraction
Open

feat: extract rationale comments + ADR/RFC doc references from JS/TS#1599
niltonmourafilho-arch wants to merge 1 commit into
Graphify-Labs:v8from
niltonmourafilho-arch:feat/js-ts-rationale-docref-extraction

Conversation

@niltonmourafilho-arch

Copy link
Copy Markdown

Summary

Parity with _extract_python_rationale: Python files get rationale nodes from docstrings and # NOTE:-style comments, but JS/TS comments were discarded entirely. This PR adds a post-pass to extract_js that:

  1. Rationale comments// NOTE:, // WHY:, // HACK: etc. (plus block-comment * NOTE: variants) become rationale nodes with rationale_for edges, matching the existing Python behavior.
  2. Doc referencesADR-NNNN / RFC NNNN citations found in comments become doc_ref nodes with cites edges from the file node.

Why

The doc_ref pass is the natural join point between code and design docs in mixed corpora. Teams conventionally cite ADR ids in TS file headers, but today those citations produce zero edges, so code↔ADR connections never form in the graph even when the citation discipline exists in the codebase.

Tested on a real mixed corpus (Flutter/Supabase monorepo, ~163 TS files + 40 ADRs): a single router.ts yields 10 ADR citations that previously produced no edges. With this patch they become direct cites edges, closing the code↔ADR gap without any LLM cost (pure line scan, same cost profile as the Python rationale pass).

Design notes

  • Spellings normalized (ADR-11 / ADR 0011ADR-0011) so references to the same document collapse to one node; deduped per file.
  • String literals excluded — only comment-shaped lines (//, /*, *) are scanned, so const s = "ADR-0099" produces nothing.
  • Conservative token set (ADR + RFC only) to avoid noise; easy to extend.

Test plan

  • 5 new tests in tests/test_rationale.py (line comment, block comment, multi-ref, normalization/dedup, string-literal exclusion)
  • tests/test_rationale.py 18/18 pass
  • tests/test_extract.py + tests/test_build.py + tests/test_languages.py: failure list identical before/after patch (5 pre-existing Windows symlink/path failures, unrelated)

Parity with _extract_python_rationale: Python files get rationale nodes
from docstrings and '# NOTE:'-style comments, but JS/TS comments were
discarded entirely. This adds a post-pass to extract_js that:

1. extracts rationale comments ('// NOTE:', '// WHY:', block-comment
   '* NOTE:' variants) as rationale nodes with rationale_for edges,
   matching the Python behavior;
2. first-classes architecture-decision references (ADR-NNNN, RFC NNNN)
   found in comments as doc_ref nodes with 'cites' edges from the file.

The doc_ref pass is the natural join point between code and design docs
in mixed corpora: teams conventionally cite ADR ids in file headers, but
today those citations produce no edges, so code<->ADR connections never
form even when the discipline exists. Spellings are normalized
(ADR-11 / ADR 0011 -> ADR-0011) so references to the same document
collapse to one node, and string literals are excluded (comment-shaped
lines only).

Tested on a real mixed corpus (Flutter/Supabase monorepo): router.ts
alone yields 10 ADR citations that previously produced zero edges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant