Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
500 changes: 372 additions & 128 deletions README.md

Large diffs are not rendered by default.

387 changes: 387 additions & 0 deletions component_mapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
# πŸ”§ **Comprehensive Component Mapping: Graph-sitter + Serena + Autogenlib**

This document provides a complete mapping of core features, classes, and functions from each tool that will be used in the unified code analysis and repair system.

---

## πŸ“Š **GRAPH-SITTER - Structural Code Analysis**

### **Core Entry Point**
- **Main Class**: `Codebase` in `src/graph_sitter/core/codebase.py`
- **Initialization**: `Codebase(repo_path, language="python"|"typescript")`

### **Primary Analysis Methods**

#### **Symbol Retrieval & Search**
```python
# Core symbol access methods
def get_file(filepath: str, optional: bool = False) -> SourceFile | None
def get_directory(dir_path: str, optional: bool = False) -> Directory | None
def get_symbol(symbol_name: str, optional: bool = False) -> Symbol | None
def get_symbols(symbol_name: str) -> list[Symbol]
def get_class(class_name: str, optional: bool = False) -> Class | None
def get_function(function_name: str, optional: bool = False) -> Function | None

# Location-based search
def find_by_span(span: Span) -> list[Editable]
```

#### **Structural Properties**
```python
# Core collections (all cached properties)
@property
def files() -> list[SourceFile] # All source files
@property
def symbols() -> list[Symbol] # All top-level symbols
@property
def classes() -> list[Class] # All class definitions
@property
def functions() -> list[Function] # All function definitions
@property
def global_vars() -> list[Assignment] # All global variables
@property
def interfaces() -> list[Interface] # All interfaces (TS only)
@property
def types() -> list[TypeAlias] # All type aliases (TS only)
@property
def imports() -> list[Import] # All import statements
@property
def external_modules() -> list[ExternalModule] # External dependencies
```

#### **Change Analysis & Git Integration**
```python
# Diff and change analysis
def get_diff(base: str | None = None, stage_files: bool = False) -> str
def get_diffs(base: str | None = None) -> list[Diff]
def get_relative_path(from_file: str, to_file: str) -> str

# File operations
def create_file(filepath: str, content: str = "", sync: bool = True) -> SourceFile
def has_file(filepath: str) -> bool
```

### **Analysis Utilities** (`codebase_analysis.py`)

#### **Summary Generation Functions**
```python
# High-level analysis functions
def get_codebase_summary(codebase: Codebase) -> str
# Returns: node counts, edge counts, symbol breakdown

def get_file_summary(file: SourceFile) -> str
# Returns: imports, symbols, classes, functions, LOC

def get_class_summary(cls: Class) -> str
# Returns: parent classes, methods, attributes, decorators, dependencies

def get_function_summary(func: Function) -> str
# Returns: parameters, return statements, function calls, call sites

def get_symbol_summary(symbol: Symbol) -> str
# Returns: usage analysis, dependency mapping, import relationships
```

### **Key Data Structures**
```python
# Core symbol types
class Symbol:
.name: str
.symbol_type: SymbolType
.symbol_usages: list[Symbol]
.dependencies: list[Symbol]

class Class(Symbol):
.parent_class_names: list[str]
.methods: list[Function]
.attributes: list[Assignment]
.decorators: list[Decorator]

class Function(Symbol):
.parameters: list[Parameter]
.return_statements: list[ReturnStatement]
.function_calls: list[FunctionCall]
.call_sites: list[CallSite]

class SourceFile:
.filepath: str
.source: str
.imports: list[Import]
.symbols: list[Symbol]
.classes: list[Class]
.functions: list[Function]
```

---

## πŸ”§ **SERENA - LSP Diagnostic System**

### **Core Entry Point**
- **Main Class**: `SolidLanguageServer` in `src/solidlsp/ls.py`
- **Initialization**: Language-specific server setup with project configuration

### **Primary Diagnostic Methods**

#### **Error Retrieval Functions**
```python
# File-level diagnostics
def request_text_document_diagnostics(relative_file_path: str) -> list[Diagnostic]
# Process: Opens file β†’ Sends LSP request β†’ Transforms response
# Returns: List of diagnostics with URI, severity, message, range, code

# Workspace-wide diagnostics
def workspace_diagnostic(params: WorkspaceDiagnosticParams) -> WorkspaceDiagnosticReport
# Capabilities: Cross-file analysis, project-wide type checking
# Returns: Comprehensive workspace diagnostic report

# Symbol-based context retrieval
def request_containing_symbol(relative_file_path: str, line: int, col: int,
strict: bool = False, include_body: bool = False)
# Returns: Symbol at specific location for context-aware analysis

def request_workspace_symbol(query: str) -> list[UnifiedSymbolInformation]
# Returns: Cross-workspace symbol search for dependency analysis
```

#### **Language Server Management**
```python
# Multi-language support (13+ languages)
class LanguageServerRegistry:
- PyrightServer (Python)
- TypeScriptServer (TypeScript/JavaScript)
- RustAnalyzer (Rust)
- ClangdServer (C/C++)
- JavaLanguageServer (Java)
- GoplsServer (Go)
- OmniSharpServer (C#)
# ... and more

# Server lifecycle management
def start_server() -> None
def stop_server() -> None
def restart_server() -> None
def get_server_status() -> ServerStatus
```

### **Core Data Structures**

#### **Diagnostic Types**
```python
class DiagnosticSeverity(IntEnum):
ERROR = 1 # Critical compilation/runtime errors
WARNING = 2 # Potential issues, best practice violations
INFORMATION = 3 # Informational messages
HINT = 4 # Suggestions for improvement

class Diagnostic(TypedDict):
uri: DocumentUri # File URI where diagnostic applies
range: Range # Exact location (line/character range)
severity: DiagnosticSeverity # Error level
message: str # Human-readable error description
code: str # Error code (e.g., "E0001", "TS2304")
source: str # Tool that generated diagnostic
tags: list[DiagnosticTag] # Additional metadata
relatedInformation: list[DiagnosticRelatedInformation] # Related diagnostics

class Range(TypedDict):
start: Position # Start position
end: Position # End position

class Position(TypedDict):
line: int # Zero-based line number
character: int # Zero-based character offset
```

#### **Workspace Analysis**
```python
class WorkspaceDiagnosticParams(TypedDict):
identifier: str # Registration identifier
previousResultIds: list[PreviousResultId] # Incremental analysis
workDoneToken: ProgressToken # Progress reporting
partialResultToken: ProgressToken # Streaming results

class WorkspaceDiagnosticReport(TypedDict):
items: list[WorkspaceDocumentDiagnosticReport] # All workspace diagnostics
```

### **Performance & Caching**

#### **Intelligent Caching System**
```python
# Document symbol caching
@property
def cache_path() -> Path:
# Returns: .serena/cache/{language}/document_symbols_cache_v23-06-25.pkl

def save_cache() -> None:
# Persists diagnostic and symbol information to disk

def load_cache() -> dict:
# Loads cached diagnostics with content-based invalidation

# Smart cache invalidation
- Content-based hashing: Only re-analyze changed files
- Dependency tracking: Invalidate dependent files when imports change
- Incremental updates: Merge new diagnostics with cached results
```

#### **Error Handling & Recovery**
```python
class LSPError:
message: str
cause: Exception | None

def send_error_response(request_id: Any, err: LSPError) -> None
def on_error(err: Exception) -> None # Language server crash recovery
def get_result(timeout: float | None = None) -> Result # Timeout protection
```

---

## ⚑ **AUTOGENLIB - Dynamic Code Generation**

### **Core Entry Point**
- **Main Function**: `init(description, enable_exception_handler=True, enable_caching=False)`
- **Dynamic Import**: `AutoLibFinder` in `sys.meta_path` for import interception

### **Primary Generation Methods**

#### **Code Generation Engine** (`_generator.py`)
```python
def generate_code(description: str, fullname: str,
existing_code: str | None = None,
caller_info: dict | None = None) -> str:
# Process: Context building β†’ LLM request β†’ Code extraction β†’ Validation
# Returns: Generated Python code with full context awareness

def get_codebase_context() -> str:
# Returns: Full codebase context from all cached modules
# Format: Module-by-module code documentation

def extract_python_code(response: str) -> str:
# Handles: Code blocks, multiple blocks, indented code, unmarked code
# Returns: Clean, executable Python code

def validate_code(code: str) -> bool:
# Validates: Syntax correctness using AST parsing
# Returns: Boolean indicating code validity
```

#### **Exception-Driven Learning** (`_exception_handler.py`)
```python
def setup_exception_handler() -> None:
# Installs: Global exception handler for automatic error fixing
# Hooks into: sys.excepthook for runtime error interception

def handle_exception(exc_type: type, exc_value: Exception, exc_traceback) -> None:
# Process: Error analysis β†’ Context extraction β†’ Fix generation β†’ Code update
# Capabilities: Runtime error fixing, learning from failures

def generate_fix(exc_type: type, exc_value: Exception,
exc_traceback, caller_info: dict) -> str:
# Returns: LLM-generated fix for the specific exception
# Context: Full traceback, caller code, existing modules
```

#### **Dynamic Import System** (`_finder.py`)
```python
class AutoLibFinder:
def find_spec(fullname: str, path, target=None) -> ModuleSpec | None:
# Intercepts: Import statements for autogenlib.* modules
# Triggers: Code generation when module doesn't exist

def create_module(spec: ModuleSpec) -> ModuleType:
# Creates: Dynamic module with generated code
# Caches: Generated modules for future use
```

### **Core Data Structures**

#### **State Management** (`_state.py`)
```python
# Global configuration state
description: str = "A helpful library" # System description
exception_handler_enabled: bool = True # Auto-fix exceptions
caching_enabled: bool = False # Cache generated code
```

#### **Caching System** (`_cache.py`)
```python
def get_cached_code(module_name: str) -> str | None:
# Returns: Previously generated code for module

def cache_code(module_name: str, code: str, prompt: str = None) -> None:
# Stores: Generated code with optional prompt context

def get_all_modules() -> dict[str, dict]:
# Returns: All cached modules with metadata
# Format: {module_name: {"code": str, "prompt": str, "timestamp": float}}

def get_cached_prompt(module_name: str) -> str | None:
# Returns: Cached prompt/description for module regeneration
```

#### **Context Building** (`_context.py`)
```python
def get_caller_info() -> dict:
# Returns: Caller code context including filename, code, line numbers
# Used for: Context-aware code generation

def extract_relevant_context(code: str, module_name: str) -> list[str]:
# Returns: Code snippets relevant to the requested module
# Filters: Import statements, function calls, usage patterns
```

---

## πŸ”„ **INTEGRATION MAPPING**

### **Data Flow Integration Points**

#### **Graph-sitter β†’ Serena**
```python
# Structural context for LSP operations
graph_sitter_symbols = codebase.get_symbols()
serena_diagnostics = lsp.request_text_document_diagnostics(file_path)

# Symbol resolution for precise diagnostic locations
symbol_at_location = codebase.find_by_span(diagnostic.range)
```

#### **Serena β†’ Autogenlib**
```python
# Diagnostic information feeds code generation
diagnostics = serena.get_diagnostics(file_path)
fix_description = f"Fix {diagnostic.severity} in {diagnostic.source}: {diagnostic.message}"
generated_fix = autogenlib.generate_code(fix_description, context=diagnostic)
```

#### **Autogenlib β†’ Graph-sitter**
```python
# Generated code validation and integration
generated_code = autogenlib.generate_code(description)
validation_result = codebase.create_file(target_path, generated_code, sync=True)
updated_symbols = codebase.get_symbols() # Refresh symbol table
```

### **Unified Data Structures**
```python
class CodeContext:
# Shared context between all systems
structural_info: dict # From Graph-sitter
diagnostics: list[Diagnostic] # From Serena
symbols: list[Symbol] # Shared symbol table
file_content: str # Source code
metadata: dict # Additional context

class AnalysisResult:
# Unified analysis output
errors: list[Diagnostic] # Critical issues
warnings: list[Diagnostic] # Potential problems
suggestions: list[CodeSuggestion] # Improvement recommendations
fixes: list[CodeFix] # Generated solutions
metrics: dict # Analysis metrics
```

This mapping provides the foundation for seamless integration between all three tools, enabling powerful code analysis and automated repair capabilities.

Loading