Skip to content

fix: scope cleanup-legacy.py skill discovery and preserve config.yaml#2205

Open
don-petry wants to merge 2 commits intobmad-code-org:mainfrom
don-petry:fix/cleanup-legacy-nested-skill-md-2175
Open

fix: scope cleanup-legacy.py skill discovery and preserve config.yaml#2205
don-petry wants to merge 2 commits intobmad-code-org:mainfrom
don-petry:fix/cleanup-legacy-nested-skill-md-2175

Conversation

@don-petry
Copy link
Copy Markdown

Summary

  • Add shared cleanup-legacy.py script with two fixes from bmad-suno-band-manager commit eb226a7:
    • Scoped find_skill_dirs() to only match SKILL.md at installable positions: direct children ({name}/SKILL.md) and skills subfolder (skills/{name}/SKILL.md). Deeply nested SKILL.md files (e.g. bmad-module-builder/assets/setup-skill-template/SKILL.md) are no longer incorrectly matched.
    • Config preservation backs up config.yaml before shutil.rmtree() and restores it after, so per-module configs needed by bmad-init survive cleanup.

Fixes #2175

Test plan

  • Unit tested find_skill_dirs() with direct child, skills/ subfolder, and deeply nested SKILL.md -- only installable positions matched
  • Unit tested config.yaml backup/restore during directory removal
  • npm run quality passes (204 tests, lint, format)

🤖 Generated with Claude Code

…ig.yaml

The cleanup-legacy.py script used an overly broad rglob("SKILL.md") that
matched template and asset files nested deep in the directory tree (e.g.
bmad-module-builder/assets/setup-skill-template/SKILL.md). This caused
cleanup to abort when it couldn't verify non-installable templates at the
skills directory.

Scopes find_skill_dirs() to only match SKILL.md at recognized installable
positions: direct children ({name}/SKILL.md) and skills subfolder
(skills/{name}/SKILL.md). Also adds config.yaml backup/restore around
shutil.rmtree() so per-module configs needed by bmad-init are preserved.

Fixes bmad-code-org#2175

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 4, 2026 07:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a shared cleanup-legacy.py utility under tools/scripts/ to safely remove legacy _bmad/ module directories after config/help migrations, with optional verification that skills are already installed elsewhere and preservation of per-module config.yaml.

Changes:

  • Introduces scoped skill discovery to only treat SKILL.md in installable locations as skills.
  • Adds an optional safety check (--skills-dir) to validate discovered skills exist at the installed location before deletion.
  • Preserves config.yaml across directory removal by backing it up before shutil.rmtree() and restoring it afterward.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +188 to +199
# Preserve config.yaml if present (bmad-init needs per-module configs)
config_path = target / "config.yaml"
config_backup = None
if config_path.exists():
config_backup = config_path.read_bytes()
if verbose:
print(f"Preserving config.yaml in {dirname}/", file=sys.stderr)

file_count = count_files(target)
if config_backup:
file_count -= 1 # Don't count the preserved file
if verbose:
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_backup uses truthiness checks (if config_backup:). If config.yaml exists but is empty, read_bytes() returns b'' which is falsy, so the script will not restore the file and will miscount removed files. Track preservation with an explicit boolean (e.g., config_exists = config_path.is_file()) or check config_backup is not None instead of truthiness.

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +176
for dirname in dirs_to_remove:
target = Path(bmad_dir) / dirname
if not target.exists():
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dirname values come from CLI (--module-code / --also-remove) and are directly appended to bmad_dir before shutil.rmtree(). A crafted value like ../ could delete directories outside _bmad/. Consider validating names (reject path separators / ..) or resolving target and asserting it is within Path(bmad_dir).resolve() before removal.

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +86
for skill_md in root.rglob("SKILL.md"):
rel = skill_md.parent.relative_to(root)
parts = rel.parts
# Direct child: {name}/SKILL.md
if len(parts) == 1:
skills.append(parts[0])
# Skills subfolder: skills/{name}/SKILL.md
elif len(parts) == 2 and parts[0] == "skills":
skills.append(parts[1])
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_skill_dirs() still walks the entire tree via root.rglob('SKILL.md') and then filters by depth. To avoid unnecessary deep traversal (and speed up large modules), consider using targeted globs for the only supported installable locations (e.g., root.glob('*/SKILL.md') and root.glob('skills/*/SKILL.md')) instead of scanning everything.

Suggested change
for skill_md in root.rglob("SKILL.md"):
rel = skill_md.parent.relative_to(root)
parts = rel.parts
# Direct child: {name}/SKILL.md
if len(parts) == 1:
skills.append(parts[0])
# Skills subfolder: skills/{name}/SKILL.md
elif len(parts) == 2 and parts[0] == "skills":
skills.append(parts[1])
# Direct child: {name}/SKILL.md
for skill_md in root.glob("*/SKILL.md"):
skills.append(skill_md.parent.name)
# Skills subfolder: skills/{name}/SKILL.md
skills_root = root / "skills"
if skills_root.exists():
for skill_md in skills_root.glob("*/SKILL.md"):
skills.append(skill_md.parent.name)

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +203
file_count = count_files(target)
if config_backup:
file_count -= 1 # Don't count the preserved file
if verbose:
print(
f"Removing {target} ({file_count} files)",
file=sys.stderr,
)
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count_files() is called unconditionally for every target directory. For large legacy trees this adds a full extra traversal even when --verbose is off. If files_removed_count is not strictly required, consider only counting when verbose (or make it opt-in) to keep cleanup fast.

Copilot uses AI. Check for mistakes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

📝 Walkthrough

Walkthrough

Adds a new executable Python script that removes legacy module directories from a BMad installation path. The script scans for installable skills at recognized locations, verifies they exist in the target skills directory, removes legacy directories while preserving config.yaml files, and reports results in JSON format.

Changes

Cohort / File(s) Summary
Legacy Cleanup Script
tools/scripts/cleanup-legacy.py
New script implementing directory removal with safety verification. Includes argument parsing, scoped skill directory discovery (limiting matches to direct children and skills/ subdirectory positions), installation verification against a provided skills directory, removal via shutil.rmtree with config.yaml preservation, and JSON result reporting with exit code handling for missing skills and file system errors.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: fixing skill discovery scoping and preserving config.yaml in cleanup-legacy.py.
Description check ✅ Passed The description clearly explains both fixes, references the linked issue #2175, outlines the test plan, and relates directly to the changeset.
Linked Issues check ✅ Passed The code implements both requirements from #2175: scoping find_skill_dirs() to only match SKILL.md at installable positions and preserving config.yaml during cleanup.
Out of Scope Changes check ✅ Passed All changes are in scope: the new script implements the two specific fixes referenced in #2175 with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tools/scripts/cleanup-legacy.py (1)

61-87: Prefer a single source of truth for skill discovery.

tools/installer/core/manifest-generator.js:collectSkills() still discovers installable SKILL.md entrypoints recursively, while this helper hardcodes only direct children and skills/{name}. That makes the pre-delete safety check easy to drift from install behavior and can silently under-verify nested skill layouts if they ever appear in a legacy _bmad tree. Reusing the installer's discovery rules here, or explicitly failing on deeper SKILL.md paths, would make this guard safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/scripts/cleanup-legacy.py` around lines 61 - 87, The current
find_skill_dirs function only accepts direct children and skills/{name} layouts,
which diverges from the installer's recursive discovery in collectSkills; update
find_skill_dirs to use the same discovery rules as the installer by either
calling the installer function collectSkills (from
tools/installer/core/manifest-generator.js or its Python wrapper) or by making
the function reject/deem unsafe any SKILL.md nested deeper than the installer's
allowed entrypoints (i.e., explicitly fail or raise an error on deeper paths),
ensuring the guard's behavior exactly matches collectSkills' logic so pre-delete
checks can't drift from install behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/scripts/cleanup-legacy.py`:
- Around line 188-220: The code incorrectly treats zero-byte preserved configs
as absent and allows restoration errors to escape handling; update the guards
around config_backup in the cleanup routine to use "if config_backup is not
None:" (affecting the file_count adjustment and the restore condition) so empty
b"" files are preserved and restored, and move the restoration steps
(target.mkdir(...) and config_path.write_bytes(...)) into the same try/except
that wraps shutil.rmtree(target) so any errors during restore are caught and
reported via the existing error_result (which references removed and dirname)
and trigger sys.exit(2).

---

Nitpick comments:
In `@tools/scripts/cleanup-legacy.py`:
- Around line 61-87: The current find_skill_dirs function only accepts direct
children and skills/{name} layouts, which diverges from the installer's
recursive discovery in collectSkills; update find_skill_dirs to use the same
discovery rules as the installer by either calling the installer function
collectSkills (from tools/installer/core/manifest-generator.js or its Python
wrapper) or by making the function reject/deem unsafe any SKILL.md nested deeper
than the installer's allowed entrypoints (i.e., explicitly fail or raise an
error on deeper paths), ensuring the guard's behavior exactly matches
collectSkills' logic so pre-delete checks can't drift from install behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 41b08d6c-cbc6-49db-95c7-6c48ddbfece1

📥 Commits

Reviewing files that changed from the base of the PR and between 3a24d8f and db7b497.

📒 Files selected for processing (1)
  • tools/scripts/cleanup-legacy.py

Comment on lines +188 to +220
# Preserve config.yaml if present (bmad-init needs per-module configs)
config_path = target / "config.yaml"
config_backup = None
if config_path.exists():
config_backup = config_path.read_bytes()
if verbose:
print(f"Preserving config.yaml in {dirname}/", file=sys.stderr)

file_count = count_files(target)
if config_backup:
file_count -= 1 # Don't count the preserved file
if verbose:
print(
f"Removing {target} ({file_count} files)",
file=sys.stderr,
)

try:
shutil.rmtree(target)
except OSError as e:
error_result = {
"status": "error",
"error": f"Failed to remove {target}: {e}",
"directories_removed": removed,
"directories_failed": dirname,
}
print(json.dumps(error_result, indent=2))
sys.exit(2)

# Restore preserved config.yaml
if config_backup:
target.mkdir(parents=True, exist_ok=True)
config_path.write_bytes(config_backup)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf 'Truthiness of empty bytes in Python:\n'
python - <<'PY'
print(bool(b""))
PY

printf '\nCurrent config preservation block:\n'
sed -n '188,220p' tools/scripts/cleanup-legacy.py

Repository: bmad-code-org/BMAD-METHOD

Length of output: 1325


🏁 Script executed:

# Get the full function containing these lines to understand complete control flow
grep -n "def " tools/scripts/cleanup-legacy.py | tail -5

Repository: bmad-code-org/BMAD-METHOD

Length of output: 233


🏁 Script executed:

# View the complete function from the beginning to lines 220 plus context
sed -n '150,225p' tools/scripts/cleanup-legacy.py

Repository: bmad-code-org/BMAD-METHOD

Length of output: 2450


🏁 Script executed:

# Check if there's any error handling or try-except wrapper around the function call
grep -B5 -A10 "for dirname in" tools/scripts/cleanup-legacy.py | head -40

Repository: bmad-code-org/BMAD-METHOD

Length of output: 1041


Fix empty-byte config handling and restore error safety.

read_bytes() returns b"" for zero-byte files, which is falsy, so both if config_backup: guards (lines 196 and 207) skip empty configs during file counting and restoration. Additionally, the restore operations (lines 208–209) run outside the try block wrapping rmtree(), so mkdir() or write_bytes() errors escape error handling and produce a traceback instead of the documented JSON error response and sys.exit(2).

Fix: Move preservation and restoration into the try block and replace if config_backup: with if config_backup is not None: to safely handle empty files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/scripts/cleanup-legacy.py` around lines 188 - 220, The code incorrectly
treats zero-byte preserved configs as absent and allows restoration errors to
escape handling; update the guards around config_backup in the cleanup routine
to use "if config_backup is not None:" (affecting the file_count adjustment and
the restore condition) so empty b"" files are preserved and restored, and move
the restoration steps (target.mkdir(...) and config_path.write_bytes(...)) into
the same try/except that wraps shutil.rmtree(target) so any errors during
restore are caught and reported via the existing error_result (which references
removed and dirname) and trigger sys.exit(2).

- Fix config restoration for zero-byte files by using `is not None`
  instead of truthiness checks on `config_backup` (empty bytes `b""` is
  falsy, causing silent loss of empty config.yaml files)
- Move config restore into the try/except block so mkdir/write_bytes
  errors are caught and reported as structured JSON instead of tracebacks
- Add logging.error() call on failure for observability
- Replace rglob("SKILL.md") with targeted glob() calls to avoid
  unnecessary deep traversal — only the two canonical installable
  layouts are checked
- Add docstring note explaining why find_skill_dirs() is intentionally
  stricter than the installer's recursive collectSkills()
- Add path traversal validation rejecting "..", "/", "\\" in dir names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@don-petry
Copy link
Copy Markdown
Author

All review feedback from CodeRabbit and Copilot has been addressed in the latest push (9924dc6):

  • Zero-byte config preservation — changed truthiness check to if config_backup is not None: so empty config.yaml files are correctly preserved and restored
  • Error handling for config restoration — moved restoration steps into the same try/except block wrapping shutil.rmtree() so errors are caught and reported via sys.exit(2)
  • Path traversal safety — addressed concern about crafted dirname values escaping _bmad/ directory
  • Skill discovery scope — acknowledged nitpick about find_skill_dirs diverging from installer's collectSkills() recursive discovery; current implementation intentionally scopes to direct children and skills/{name} which covers all known legacy layouts

All reviews above were submitted against commit db7b497 before the fix commit was pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] BMB cleanup-legacy.py find_skill_dirs matches nested SKILL.md files (e.g. setup-skill-template)

2 participants