Skip to content

feat: 14-language AST support with heritage, call resolution, and dead code improvements#78

Merged
RaghavChamadiya merged 10 commits intomainfrom
feat/language-support
Apr 13, 2026
Merged

feat: 14-language AST support with heritage, call resolution, and dead code improvements#78
RaghavChamadiya merged 10 commits intomainfrom
feat/language-support

Conversation

@RaghavChamadiya
Copy link
Copy Markdown
Collaborator

Summary

Expands repowise from 7 to 14 languages with full AST support (Python, TypeScript, JavaScript, Java, Go, Rust, C++, C, Kotlin, Ruby, C#, Swift, Scala, PHP) through a 3-phase language pipeline overhaul, plus P0/P1 bug fixes.

Phase 1 — Centralize language config

  • Introduced LanguageRegistry with 42 LanguageSpec entries for identity data (extensions, entry points, manifests, builtins, heritage node types)
  • Decoupled language identity from tree-sitter parser configuration

Phase 2 — Modularize extractors and resolvers

  • Extracted per-language logic into dedicated modules: extractors/ (bindings, heritage, visibility, docstrings, signatures) and resolvers/ (one per language)
  • Each new language gets its own resolver with language-specific import resolution

Phase 3 — Add 6 new languages + harden C/C++

  • Added .scm tree-sitter queries, LanguageConfig entries, extractors, resolvers, and test fixtures for Kotlin, Ruby, C#, Swift, Scala, PHP
  • Hardened C/C++ with compile_commands.json resolution

P0 Bug fixes

  • Fixed _detect_unused_exports — was reading symbols from file node attributes instead of graph successor iteration
  • Fixed PHP methods without visibility modifier (defaults to public)
  • Fixed Kotlin interface/enum detection via refine_kotlin_class_kind()

P1 Improvements

  • Heritage extraction for Ruby mixins (include/extend/prepend), Rust #[derive()], Swift extension conformance, PHP use TraitName
  • PHP require/include/require_once/include_once import captures
  • Multi-language dynamic import detection (JS, Java, Kotlin, Ruby, PHP, Go)
  • _detect_unused_internals for private/internal symbols with zero incoming call edges
  • Synthetic __module__ symbol for module-level call resolution

Docs

  • Updated README, LANGUAGE_SUPPORT.md, ARCHITECTURE.md, and website docs for 14-language coverage
  • Removed obsolete planning/handoff docs

Test plan

  • All 1130+ existing tests pass with zero regressions
  • New test fixtures for all 6 added languages (Kotlin, Ruby, C#, Swift, Scala, PHP)
  • Heritage extraction tests cover new derive kind
  • Graph tests updated for synthetic __module__ nodes
  • Dead code tests updated for graph-based symbol lookup

Replace FTS-only file retrieval with a 3-signal ranking system:
- Symbol name match (weight 2.0) — most precise
- File path match (weight 1.5) — catches path-based searches
- FTS on wiki content (weight 1.0) — broadest, lowest priority
Files ranked by signal score then PageRank, top 3 returned.

Remove git signals (HOTSPOT, bus-factor, owner) from enrichment —
that info belongs in get_risk, not every search. Remove Bash command
interception (fragile regex on grep/rg commands).

Keep: symbols (3), importers (3), dependencies (2) per file.
Create a single LanguageRegistry with 42 LanguageSpec entries as the
source of truth for all language identity data. Migrate 14 consumer
files to derive their constants from the registry, eliminating
widespread duplication. Delete stale packages/core/queries/ directory.
…py (Phase 2)

Extract per-language logic into dedicated packages:
- extractors/ — visibility, signatures, docstrings, bindings, heritage
- resolvers/ — Python, TS/JS, Go, Rust, C/C++, generic stem fallback
- framework_edges.py — Django, FastAPI, Flask, pytest conftest detection

parser.py drops from 1,806 to 796 lines (pure orchestration).
graph.py drops from 1,286 to 646 lines. Delete dead parsers/ stubs.
Update Adding a New Language guide to reflect modular architecture
(extractors/, resolvers/ instead of inline in parser.py/graph.py).
Add architecture section and updated roadmap.

Create Phase 3 handoff doc covering remaining language work:
hardening C++/C, wiring Kotlin/Ruby/C#, adding Swift/Scala/PHP.
Complete language pipeline for Kotlin, Ruby, C#, Swift, Scala, and PHP
with tree-sitter grammars, .scm queries, LanguageConfig entries,
per-language extractors (bindings, docstrings, visibility, heritage),
and dedicated import resolvers. Harden C++ with binding extraction and
Doxygen docstrings, add call captures to C. Brings total AST-supported
languages to 14 (7 Full + 7 Good tier).

- Add 6 grammar dependencies (tree-sitter-kotlin/ruby/c-sharp/swift/scala/php)
- Create .scm query files for C#, Swift, Scala, PHP; extend Kotlin, Ruby, C
- Add LanguageConfig entries for all 8 languages in parser.py
- Add per-language visibility functions (kotlin, csharp, swift, scala, php)
- Add binding extractors for all 8 languages
- Add docstring extractors (KDoc, RDoc, XML doc, Swift doc, ScalaDoc, PHPDoc, Doxygen)
- Add heritage extractors for Swift, Scala, PHP
- Create dedicated resolvers for Kotlin, Ruby, C#, Swift, Scala, PHP
- Add 37 new parser tests with fixtures for all 6 languages
- Update registry specs with grammar_package and heritage_node_types
- Update README.md and LANGUAGE_SUPPORT.md documentation
…n interfaces

- Fix _detect_unused_exports to read symbol nodes via DEFINES edges
  instead of non-existent 'symbols' attribute on file nodes
- Add fallback PHP method_declaration pattern without visibility_modifier
  so methods defaulting to public are captured
- Add refine_kotlin_class_kind() to distinguish interface/enum from
  regular class in Kotlin class_declaration nodes
- Update test helper _build_graph to create proper symbol nodes
…, PHP traits

- Ruby: extract include/extend/prepend from class body as mixin relations
- Rust: extract #[derive(Trait)] from struct/enum attribute items
- Swift: add extension conformance capture (user_type pattern in .scm)
- PHP: extract use TraitName; from class declaration_list
- Add struct_item/enum_item to Rust heritage_node_types
- Add 'derive' to valid heritage kinds in integration tests
…ternals, module-level calls

- Add PHP require/require_once/include/include_once as import captures
- Extend dynamic import detection to JS/TS/Java/Kotlin/Ruby/PHP/Go
- Implement _detect_unused_internals for private symbols with no callers
- Add synthetic __module__ symbol per file for module-level call resolution
- Update call_resolver to assign orphan calls to __module__ symbol
…bsolete planning docs

Update README, LANGUAGE_SUPPORT.md, ARCHITECTURE.md, and website docs to
reflect 14 AST-supported languages (7 Full + 7 Good tier) with heritage
extraction improvements. Remove obsolete planning and handoff docs.
@RaghavChamadiya RaghavChamadiya merged commit cb27a7e into main Apr 13, 2026
2 of 5 checks passed
@RaghavChamadiya RaghavChamadiya deleted the feat/language-support branch April 13, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants