feat: add subdirectory .gitignore support for monorepos#144
Open
Ferymad wants to merge 1 commit intocased:mainfrom
Open
feat: add subdirectory .gitignore support for monorepos#144Ferymad wants to merge 1 commit intocased:mainfrom
Ferymad wants to merge 1 commit intocased:mainfrom
Conversation
Load all .gitignore files in repository tree recursively and merge patterns with proper precedence (deeper overrides shallower). Adjust relative patterns to be repo-root-relative. Changes: - Update RepoMapper._load_gitignore() with recursive loading - Update CodeSearcher._load_gitignore() with same implementation - Add comprehensive unit tests for multi-level .gitignore - Add integration test with humanlayer repo validation Fixes token overflow on large monorepos with multiple .gitignore files. Before: 98,895 files (4.4M tokens) After: Expected ~670 files (~50k tokens) Related to SOL-1 implementation plan Phase 2.
tnm
reviewed
Oct 5, 2025
| self._file_tree: Optional[List[Dict[str, Any]]] = None | ||
| self._gitignore_spec = self._load_gitignore() | ||
|
|
||
| def _load_gitignore(self): |
Contributor
There was a problem hiding this comment.
@Ferymad Looks like the the _load_gitignore functions in repo_mapper.py and code_searcher.py ought to be extracted/unified/de-duplicated?
Contributor
|
Good idea. We have one failing test, also see my comment above re: duplicated functionality. Should be fine once those are addressed. |
Contributor
|
@Ferymad I would like to get this in; can you review the comments here? |
tnm
added a commit
that referenced
this pull request
Nov 23, 2025
Fixes #144 by correcting the order and handling of subdirectory .gitignore files: 1. **Fixed pattern precedence**: Changed sort order from deepest-first to shallowest-first, allowing subdirectory patterns to properly override parent patterns (Git processes .gitignore from root to leaf) 2. **Fixed negation patterns**: Preserve ! prefix at the beginning when adjusting patterns for subdirectories (was becoming `dir/!pattern` instead of `!dir/**/pattern`) 3. **Fixed subdirectory pattern scope**: Patterns in subdirectory .gitignore files now use `/**/` to match at any depth under that directory (e.g., `level1/**/*.cache` instead of `level1/*.cache`), matching Git's actual behavior 4. **Added comprehensive tests**: - Test CodeSearcher respects subdirectory .gitignore - Test absolute patterns in subdirectories - Test complex negation scenarios - Test deeply nested .gitignore files with multiple levels All original tests from PR #144 now pass, plus 4 additional edge case tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Subdirectory .gitignore Support to RepoMapper and CodeSearcher
What problem(s) was I solving?
Kit-dev only loaded the root
.gitignorefile, completely ignoring subdirectory.gitignorefiles. This caused massive file count inflation on monorepos, leading to token overflow errors in MCP tools.Concrete Example: The humanlayer repository has 13
.gitignorefiles total:.gitignore(no node_modules pattern)humanlayer-wui/.gitignore(containsnode_modules).gitignorefilesBecause only the root
.gitignorewas loaded, all 205node_modulesdirectories (88,522 files) were included in results, causing:get_file_treeMCP toolRelated issues:
What user-facing changes did I ship?
RepoMapper & CodeSearcher Behavior
get_file_tree()now respects ALL.gitignorefiles in the repository tree.gitignorefiles override shallower ones)Performance Impact
Backwards Compatibility
✅ Fully backwards compatible:
.gitignore: Identical behavior.gitignore: Identical behavior.gitignorefiles: Now works correctlyHow I implemented it
Phase 1: Update
_load_gitignore()in RepoMapperFile:
src/kit/repo_mapper.py.gitignorefile loading to recursive tree walkingos.walk()to find all.gitignorefiles in repository.gitdirectory to avoid performance issues.gitignorefiles by depth (deepest first) for correct precedencePattern Processing:
.gitignorefile and process patterns line-by-line#).gitignorepatterns: use as-is/pattern): make relative to repo root from subdirectoryExample:
frontend/.gitignorecontainingnode_modules/becomesfrontend/node_modules/in the merged specpathspec.PathSpecfor efficient matchingNoneif no.gitignorefiles exist (graceful degradation)Phase 2: Update
_load_gitignore()in CodeSearcherFile:
src/kit/code_searcher.pyimport loggingandimport osfor error handling and filesystem walking.gitignoreloading logicPhase 3: Comprehensive Testing
Unit Tests (
tests/test_gitignore.py):test_root_gitignore_only(): Baseline behavior unchangedtest_subdirectory_gitignore(): Subdirectory patterns respectedtest_nested_gitignore_precedence(): Negation patterns work correctlytest_multiple_subdirectory_gitignores(): Multiple subdirs each with own.gitignoretest_no_gitignore_files(): Graceful handling of repos without.gitignoreIntegration Test (
tests/integration/test_humanlayer_repo.py):git ls-filescountnode_modulesfiles includedHow to verify it
I have ensured tests pass
Manual Testing
Test on small repository (baseline verification):
Test on large monorepo (fix verification):
Expected results:
Test subdirectory patterns:
Test MCP integration (if kit-dev MCP server is available):
Description for the changelog
Fixed
.gitignorehandling to respect subdirectory.gitignorefiles (previously only root was loaded). RepoMapper and CodeSearcher now recursively load all.gitignorefiles with proper pattern precedence, eliminating token overflow on large monorepos with multiple.gitignorefiles.