Skip to content

feat: cross-file reference validator for BMAD source files#1494

Merged
bmadcode merged 13 commits intobmad-code-org:mainfrom
arcaven:feat/validate-file-refs
Feb 3, 2026
Merged

feat: cross-file reference validator for BMAD source files#1494
bmadcode merged 13 commits intobmad-code-org:mainfrom
arcaven:feat/validate-file-refs

Conversation

@arcaven
Copy link
Copy Markdown
Contributor

@arcaven arcaven commented Jan 31, 2026

Why

Broken file references and path handling account for 59 closed bugs (25% of all bugs) and 129+ comments. These categories have zero automated prevention today. When files are renamed or have extensions changed, stale references are only found by users post-release.

We replayed the validator against all 26 v6 release tags. The data tells a clear story:

Metric Value
Broken refs tracked across v6 history 289
Fixed manually without automation 285
Peak breakage in a single release 208
False positives on current HEAD 0

The alpha.17 src/modules/ migration alone introduced 137 broken refs that were fixed piecemeal across 5 subsequent releases. With this validator, all 137 would have appeared in the build log on that single PR.

Fixes #1493
Related to bmad-code-org/bmad-builder#7 and #1529

What

Add tools/validate-file-refs.js — a cross-file reference validator that checks whether file paths referenced across BMAD source files actually point to existing files. It also detects absolute path leaks (/Users/, /home/, C:\).

  • Scans src/ YAML, markdown, and XML files (~217 files, ~483 references on current main)
  • Validates {project-root}/_bmad/ paths, {_bmad}/ shorthand, relative paths, exec attributes, <invoke-task> targets, step metadata, and Load directives
  • Skips unresolvable runtime variables ({{mustache}}, {installed_path}, {output_folder}, etc.) and install-generated paths (_config/, config.yaml)
  • Self-contained: only node:fs, node:path, and yaml (existing dep)
  • Adds validate:refs npm script and a step in the existing validate CI job

Why is this safe to adopt

Non-blocking by design

The validator runs in warning mode by default (exit 0). Broken references appear in the build log for visibility, but build results are unaffected. No existing CI checks, pre-commit hooks, or npm scripts are modified in behavior. The validator is purely additive.

Every existing CI check continues to enforce exactly as before:

CI check Before this PR After this PR
Prettier, ESLint, markdownlint Enforced (exit 1) Enforced (exit 1)
Doc links, docs build Enforced (exit 1) Enforced (exit 1)
Schema validation, agent tests, install tests Enforced (exit 1) Enforced (exit 1)
File ref validation (new) Does not exist Warning only (exit 0)

When ready to enforce, one change in package.json:

- "validate:refs": "node tools/validate-file-refs.js"
+ "validate:refs": "node tools/validate-file-refs.js --strict"

Try it

npm run validate:refs              # Warning mode (exit 0)
npm run validate:refs -- --verbose # Show all refs checked
npm run validate:refs -- --strict  # Enforcement mode (exit 1)

Or grab just the file without checking out the branch:

curl -o tools/validate-file-refs.js https://raw.githubusercontent.com/arcaven/BMAD-METHOD/feat/validate-file-refs/tools/validate-file-refs.js
node tools/validate-file-refs.js

The validator only reads files. It makes no changes to disk.

Historic impact

Specific issues this would have preempted:

Broken references on current main

File Reference Issue
steps-c/step-11-polish.md:8 ./data/prd-purpose.md → should be ../data/ #1495 🔍 found by this tool
steps-e/step-e-04-complete.md:7 ./steps-v/step-v-01-discovery.md → should be ../steps-v/ #1496 🔍 found by this tool
create-story/checklist.md:36,67 core/tasks/validate-workflow.xml → doesn't exist #1455/#1324
core/tasks/workflow.xml:84 core/workflows/party-mode/workflow.yaml → file is .md #1212

How

flowchart TD
    A[Discover source files in src/] --> B[Read file content]
    B --> C{YAML or Markdown?}
    C -->|YAML| D[Parse YAML & walk values]
    C -->|Markdown/XML| E[Strip code blocks\n& JSON examples]
    E --> F[Extract refs via regex patterns]
    D --> G[Collect references]
    F --> G
    G --> H{Reference type?}
    H -->|project-root / _bmad| I[Map installed path → source path\ne.g. _bmad/core/ → src/core/]
    H -->|Relative path| J[Resolve from containing file dir]
    H -->|exec / invoke-task| K[Extract path & resolve]
    I --> L{fs.existsSync?}
    J --> L
    K --> L
    L -->|Missing| M["Report [BROKEN] with file:line"]
    L -->|Found| N["Report [OK] in verbose mode"]
    B --> O{Absolute path leak?}
    O -->|/Users/ /home/ C:\| P["Report [ABS-PATH]"]
    M --> Q[Summary & exit code]
    N --> Q
    P --> Q
    Q --> R{--strict flag?}
    R -->|Yes + issues| S[Exit 1]
    R -->|No or clean| T[Exit 0]
Loading

Reference types validated

Pattern Example Source
{project-root}/_bmad/ paths {project-root}/_bmad/core/tasks/workflow.xml Agent YAML, workflow MD
{_bmad}/ shorthand {_bmad}/bmm/workflows/dev-story/workflow.yaml Agent YAML
Relative paths ./step-02-analysis.md, ../data/template.md Step MD, workflow MD
exec attributes exec="core/tasks/validate-workflow.xml" Workflow XML
<invoke-task> targets <invoke-task>_bmad/core/tasks/workflow.xml</invoke-task> Task XML
Step metadata nextStepFile: './step-03.md' Step MD
Load directives Load: `./checklist.md` Workflow MD
Absolute path leaks /Users/dev/project/... Any file

Testing

Near-term improvements

If this validator is useful, a few low-effort follow-ups could improve visibility of findings in CI:

PR comments and Check Run annotations were considered but require pull-requests: write or checks: write permissions, which introduce security considerations for fork-based PRs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 31, 2026

📝 Walkthrough

Walkthrough

Adds a new CI validation step that checks for broken cross-file references in BMAD source content (agents, workflows, tasks, steps). Introduces a standalone CLI tool that scans source files, extracts references, resolves them to actual file paths, and reports unresolved or absolute path leaks.

Changes

Cohort / File(s) Summary
CI and build configuration
.github/workflows/quality.yaml, package.json
Adds validate:refs npm script pointing to the new validation tool and integrates it into the validate CI job as a post-compilation step.
File reference validator
tools/validate-file-refs.js
New 426-line validation tool that scans .yaml, .yml, .md, and .xml files under src/, extracts references to other source files (project-root paths, relative paths, exec attributes, invoke-task tags, step metadata), resolves them to actual file paths, detects absolute path leaks, and reports broken references with line context. Supports warning-only mode (default, exit 0) and strict mode (--strict, exit 1).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR implements all core objectives from issue #1493: a self-contained validator that scans src/ files, extracts cross-file references (project-root/_bmad/, relative paths, exec attributes, invoke-task tags, step metadata, Load directives), resolves them, detects absolute path leaks, reports broken references with file:line context, runs warning-only by default with --strict enforcement option, adds validate:refs npm script and CI step.
Out of Scope Changes check ✅ Passed All changes are strictly in scope: tools/validate-file-refs.js implements the validator, package.json adds the validate:refs npm script, and .github/workflows/quality.yaml integrates the validator into CI. No unrelated modifications or refactoring are present.
Title check ✅ Passed The pull request title accurately summarizes the main change: introducing a cross-file reference validator for BMAD source files. The title is concise, specific, and directly reflects the core purpose of the changeset.
Description check ✅ Passed The PR description clearly explains the motivation (59 closed bugs related to broken refs), the solution (a new validator tool), and how it works, with detailed examples and reasoning.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tools/validate-file-refs.js`:
- Around line 338-340: The loop iterating over files uses fs.readFileSync
without error handling, so wrap the read operation in a try-catch inside the for
(const filePath of files) loop (where relativePath and PROJECT_ROOT are
computed) to catch any errors from fs.readFileSync; on error, log a warning that
includes relativePath (or filePath) and the error, then continue to the next
file instead of letting the exception propagate. Ensure only the read and
subsequent processing for that file are skipped, preserving the rest of the
loop.
- Around line 101-122: The getSourceFiles function currently calls
fs.readdirSync inside walk without error handling; wrap the fs.readdirSync call
in a try-catch inside walk (or validate dir before calling walk) so if
readdirSync throws (e.g., SRC_DIR missing or inaccessible) you catch the error
and throw or log a clear, user-friendly message that includes the directory path
and the original error (preserve error.stack/message). Update references in
getSourceFiles/walk so the function either returns an empty array or rethrows a
wrapped Error with context instead of letting the raw Node error bubble up.
🧹 Nitpick comments (2)
tools/validate-file-refs.js (2)

126-128: Potential regex catastrophic backtracking.

The [\s\S]*? pattern in stripCodeBlocks can cause performance issues on very large files with many code blocks. While unlikely to be exploitable here, consider a more efficient approach if performance becomes a concern.


362-372: Silent skip for extensionless paths may hide valid broken references.

When a resolved path has no extension, the code silently continues without reporting it. This could mask legitimate broken references to extensionless files. Consider logging in verbose mode or tracking these as a separate category.

Proposed improvement for verbose mode
       if (!hasExt) {
         // Could be a directory reference — skip if not clearly a file
+        if (VERBOSE) {
+          console.log(`  [SKIP] ${ref.raw} (no extension, may be directory)`);
+        }
         continue;
       }

@arcaven arcaven changed the title feat: add cross-file reference validator to CI pipeline feat: cross-file reference validator for BMAD source files Jan 31, 2026
Add tools/validate-file-refs.js that validates cross-file references
in BMAD source files (agents, workflows, tasks, steps). Catches broken
file paths, missing referenced files, wrong extensions, and absolute
path leaks before they reach users.

Addresses broken-file-ref and path-handling bug classes which account
for 25% of all historical bugs (59 closed issues, 129+ comments).

- Scans src/ for YAML, markdown, and XML files
- Validates {project-root}/_bmad/ references against source tree
- Checks relative path references, exec attributes, invoke-task tags
- Detects absolute path leaks (/Users/, /home/, C:\)
- Adds validate:refs npm script and CI step in quality.yaml
Add stripJsonExampleBlocks() to the markdown reference extractor so
bare JSON example/template blocks (braces on their own lines) are
removed before pattern matching. This prevents paths inside example
data from being flagged as broken references.
…tput

- Add utility/ to direct path mapping (was incorrectly falling through
  to src/modules/utility/)
- Show line numbers for broken references in markdown files
- Show YAML key path for broken references in YAML files
- Print file headers in verbose mode for all files with refs
Broken refs no longer print [OK] before [BROKEN] in --verbose mode.
Code block stripping now preserves newlines so offsetToLine() reports
accurate line numbers when code blocks precede broken references.
@arcaven arcaven force-pushed the feat/validate-file-refs branch from 0d0ffc1 to 773da28 Compare January 31, 2026 21:04
@arcaven
Copy link
Copy Markdown
Contributor Author

arcaven commented Jan 31, 2026

Responding to the two nitpick observations from the review:

Re: regex catastrophic backtracking in stripCodeBlocks (line 127)

Tested [\s\S]*? on a 10,000-line code block — completes in <1ms. There's no nested quantifier here; it's a non-greedy match between two literal ``` delimiters with a clear terminator. No backtracking risk in this pattern.

Re: silent skip for extensionless paths (line 362–372)

The skip is intentional. These are directory references like core/workflows/brainstorming that point to real directories, not missing files. The verbose output already shows [OK] for resolved directory paths. Adding a [SKIP] tag is a reasonable future refinement for observability but not a correctness issue — the current behavior produces zero false positives and zero missed broken refs on HEAD.

@dracic
Copy link
Copy Markdown
Contributor

dracic commented Jan 31, 2026

Nice job. This will cut the problems in half.

@alexeyv
Copy link
Copy Markdown
Collaborator

alexeyv commented Feb 1, 2026

Thank you. I keep meaning to roll something like this out for validating workflows the same way we are validating agents. Reviewing...

@alexeyv alexeyv self-assigned this Feb 1, 2026
@alexeyv
Copy link
Copy Markdown
Collaborator

alexeyv commented Feb 1, 2026

PR Review: #1494

Title: feat: cross-file reference validator for BMAD source files
Author: @arcaven
Branch: feat/validate-file-refs → main


Findings

1. exec-attr resolver missing {_bmad}/ and bare _bmad/ prefix handling [likely]

Severity: 🟡 Medium

In resolveRef(), the invoke-task branch correctly handles three prefix variants: {project-root}/_bmad/, {_bmad}/, and bare _bmad/ (lines 292-303). However, the exec-attr branch (lines 283-289) only checks for {project-root} — it falls through to relative path resolution for any other prefix.

If an exec attribute contains exec="{_bmad}/core/tasks/foo.xml" or exec="_bmad/core/tasks/foo.xml", it would be resolved relative to the file's directory rather than mapped through mapInstalledToSource, producing a false negative or false positive.

No current source files use these patterns in exec attributes (only template examples in .txt files), so this is latent — but the asymmetry between exec-attr and invoke-task handling is a consistency gap.

Suggested fix:

if (ref.type === 'exec-attr') {
    let execPath = ref.raw;
    if (execPath.includes('{project-root}')) {
      return mapInstalledToSource(execPath);
    }
    if (execPath.includes('{_bmad}')) {
      return mapInstalledToSource(execPath);
    }
    if (execPath.startsWith('_bmad/')) {
      return mapInstalledToSource(execPath);
    }
    return path.resolve(path.dirname(ref.file), execPath);
  }

2. mapInstalledToSource fallback path targets non-existent src/modules/ [likely]

Severity: 🟢 Low

The fallback at line 154 routes non-core/bmm/utility paths to path.join(SRC_DIR, 'modules', cleaned). However, src/modules/ does not exist — src/ contains only bmm/, core/, and utility/.

This isn't causing false positives today (non-existent paths correctly report as broken). But if modules like bmgd, cis, or bmb are added later, their references would be reported as broken unless this mapping is updated. The CLAUDE.md documents a src/modules/ hierarchy that doesn't match the current layout — outside this PR's scope but worth noting.

3. YAML extractor does not report line numbers [likely]

Severity: 🟢 Low

extractYamlRefs reports locations as key paths (key: agent.menu[0].workflow) while the markdown extractor provides line: N. Key paths are navigable but less ergonomic for quick file navigation. The yaml library's parseDocument() supports source position tracking via range properties if this becomes worth improving.

4. YAML walker does not extract exec/invoke-task patterns from multiline strings

Severity: 🟢 Low

extractYamlRefs only checks for path-shaped string values ({project-root}/_bmad/, {_bmad}/, relative paths). If a YAML multiline string contained markdown-style patterns like exec="..." or <invoke-task>, they would be silently skipped.

Verified that zero current .yaml files contain exec=" or <invoke-task> patterns — these only appear in .md and .xml files today. Theoretical gap only, flagged for awareness.


Summary

Critical: 0 | High: 0 | Medium: 1 | Low: 3

No binary files detected.

Overall this is a well-structured, well-motivated addition. The code is clean, self-contained, and immediately demonstrates value by catching known broken references (#1212, #1455/#1324, and two previously unreported issues). The medium finding is a straightforward consistency fix; the low findings are polish for future iterations.


Review generated by Raven's Verdict. LLM-produced analysis — findings may be incorrect or lack full context. Verify before acting.

@alexeyv
Copy link
Copy Markdown
Collaborator

alexeyv commented Feb 1, 2026

@arcaven are you on BMAD Discord by any chance?

Copy link
Copy Markdown
Collaborator

@alexeyv alexeyv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what Raven said

bmadcode and others added 3 commits February 1, 2026 08:06
Address alexeyv's review findings on PR #1494:
- Fix exec-attr prefix handling for {_bmad}/ and bare _bmad/ paths
- Fix mapInstalledToSource fallback (remove phantom src/modules/ mapping)
- Switch extractYamlRefs to parseDocument() for YAML line numbers

Add CI integration (stories 2-1, 2-2):
- Emit ::warning annotations for broken refs and abs-path leaks
- Write markdown table to $GITHUB_STEP_SUMMARY
- Guard both behind environment variable checks

Harden CI output:
- escapeAnnotation() encodes %, \r, \n per GitHub Actions spec
- escapeTableCell() escapes pipe chars in step summary table
@arcaven
Copy link
Copy Markdown
Contributor Author

arcaven commented Feb 1, 2026

@alexeyv Thank you for the detailed review and for taking the time — really appreciate the consideration and the opportunity to contribute to the project. All four findings addressed in 96595244.

Finding 1: exec-attr missing {_bmad}/ and bare _bmad/ handling — Fixed

Added your exact suggested code. resolveRef() exec-attr branch now matches invoke-task for all three prefix variants.

Finding 2: mapInstalledToSource fallback targets non-existent src/modules/ — Fixed

Changed the fallback from path.join(SRC_DIR, 'modules', cleaned) to path.join(SRC_DIR, cleaned). The core//bmm//utility/ if-check is technically redundant now (both branches produce the same result) but kept as documentation of known source directories. If the layout grows, only one line needs updating.

Finding 3: YAML extractor does not report line numbers — Fixed

Switched from yaml.parse() to yaml.parseDocument() with AST node range tracking. Each YAML scalar reference now carries a line: field (via offsetToLine()), same as the markdown extractor. Spot-checked against actual file content — line numbers are accurate.

Finding 4: YAML multiline exec/invoke-task patterns — Deferred

Confirmed: zero .yaml files in src/ contain exec=" or <invoke-task> patterns today. Agree this is theoretical only. Will address if multiline YAML patterns appear in the codebase.

Bonus: CI integration

While addressing the review, added two CI features:

  • ::warning annotations — broken refs and abs-path leaks now surface as GitHub annotations in the PR Files tab (guarded by GITHUB_ACTIONS env var)
  • $GITHUB_STEP_SUMMARY — markdown table of all issues appended to the workflow step summary (silently skipped when env var unset)

Both output paths include defensive escaping (pipe chars in markdown tables, percent/newline encoding in annotation messages per GitHub Actions spec).


A note on placement

This validator is intentionally problem-centric rather than aligned with any single architectural element — it cuts across agents, workflows, tasks, and steps because broken file references cut across all of them. The approach is driven by reported bugs and real impact to the project, not by module boundaries.

Nothing here needs to stay where it is. If it proves valuable, it can be restructured to align with modules, extracted into shared utilities, or split per concern. Value first, elegance later.

Thanks again for the welcoming community and the thoughtful feedback — glad to be here.


Ref count unchanged at 484 references across 216 files. Same 2 known broken refs (core/tasks/validate-workflow.xml in checklist.md).

@arcaven
Copy link
Copy Markdown
Contributor Author

arcaven commented Feb 1, 2026

@alexeyv as we discussed (thank you) on discord, I've proposed a more obvious PR Comment approach to surfacing the issues, but you can see here, already, what this will do with existing issues it's detected, in two places. It's easy to see these in the logs, for anyone that looks. Naturally, you can block with failing the CI rather than passing it.

These will show in submitted code in pull/files view, and they'll show as warnings a click away in the checks / Github Actions view, if you go looking. But a further option is for PR Comments to be posted inline.

Screenshot 2026-02-01 at 4 41 23 PM Screenshot 2026-02-01 at 4 41 44 PM

However, that is more difficult to maintain, and requires GHA permissions. I believe it's a good improvement, but I'm already at 484 lines in this feature. I want to see it benefit the project before I try to put chrome buff on anything.

@arcaven arcaven requested a review from alexeyv February 1, 2026 23:16
@bmadcode bmadcode merged commit ba89077 into bmad-code-org:main Feb 3, 2026
5 checks passed
@arcaven arcaven deleted the feat/validate-file-refs branch February 5, 2026 22:05
dickymoore pushed a commit to dickymoore/BMAD-METHOD that referenced this pull request Feb 6, 2026
…-org#1494)

* feat: add cross-file reference validator for CI

Add tools/validate-file-refs.js that validates cross-file references
in BMAD source files (agents, workflows, tasks, steps). Catches broken
file paths, missing referenced files, wrong extensions, and absolute
path leaks before they reach users.

Addresses broken-file-ref and path-handling bug classes which account
for 25% of all historical bugs (59 closed issues, 129+ comments).

- Scans src/ for YAML, markdown, and XML files
- Validates {project-root}/_bmad/ references against source tree
- Checks relative path references, exec attributes, invoke-task tags
- Detects absolute path leaks (/Users/, /home/, C:\)
- Adds validate:refs npm script and CI step in quality.yaml

* feat: strip JSON example blocks to reduce false-positive broken refs

Add stripJsonExampleBlocks() to the markdown reference extractor so
bare JSON example/template blocks (braces on their own lines) are
removed before pattern matching. This prevents paths inside example
data from being flagged as broken references.

* feat: add line numbers, fix utility/ path mapping, improve verbose output

- Add utility/ to direct path mapping (was incorrectly falling through
  to src/modules/utility/)
- Show line numbers for broken references in markdown files
- Show YAML key path for broken references in YAML files
- Print file headers in verbose mode for all files with refs

* fix: correct verbose [OK]/[BROKEN] overlap and line number drift

Broken refs no longer print [OK] before [BROKEN] in --verbose mode.
Code block stripping now preserves newlines so offsetToLine() reports
accurate line numbers when code blocks precede broken references.

* fix: address review feedback, add CI annotations and step summary

Address alexeyv's review findings on PR bmad-code-org#1494:
- Fix exec-attr prefix handling for {_bmad}/ and bare _bmad/ paths
- Fix mapInstalledToSource fallback (remove phantom src/modules/ mapping)
- Switch extractYamlRefs to parseDocument() for YAML line numbers

Add CI integration (stories 2-1, 2-2):
- Emit ::warning annotations for broken refs and abs-path leaks
- Write markdown table to $GITHUB_STEP_SUMMARY
- Guard both behind environment variable checks

Harden CI output:
- escapeAnnotation() encodes %, \r, \n per GitHub Actions spec
- escapeTableCell() escapes pipe chars in step summary table

---------

Co-authored-by: Alex Verkhovsky <alexey.verkhovsky@gmail.com>
Co-authored-by: Brian <bmadcode@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: cross-file reference validator for BMAD source files

4 participants