|
| 1 | +# CodeGraph Architecture |
| 2 | + |
| 3 | +This document describes the architecture and design of the CodeGraph tool. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +CodeGraph is a Python tool that creates dependency graphs from Python source code. It analyzes Python files, extracts function/class definitions and their relationships, and generates interactive visualizations. |
| 8 | + |
| 9 | +## Project Structure |
| 10 | + |
| 11 | +``` |
| 12 | +codegraph/ |
| 13 | +├── codegraph/ # Main package |
| 14 | +│ ├── __init__.py # Package init, version definition |
| 15 | +│ ├── main.py # CLI entry point (click-based) |
| 16 | +│ ├── core.py # Core graph building logic |
| 17 | +│ ├── parser.py # Python source code parser |
| 18 | +│ ├── utils.py # Utility functions |
| 19 | +│ └── vizualyzer.py # Visualization (D3.js + matplotlib) |
| 20 | +├── tests/ # Test suite |
| 21 | +│ ├── test_codegraph.py # Basic tests |
| 22 | +│ ├── test_graph_generation.py # Comprehensive graph tests |
| 23 | +│ ├── test_utils.py # Utility function tests |
| 24 | +│ └── test_data/ # Test fixtures |
| 25 | +├── docs/ # Documentation |
| 26 | +├── pyproject.toml # Poetry configuration |
| 27 | +├── tox.ini # Multi-version testing |
| 28 | +└── .github/workflows/ # CI/CD |
| 29 | +``` |
| 30 | + |
| 31 | +## Core Components |
| 32 | + |
| 33 | +### 1. Parser (`codegraph/parser.py`) |
| 34 | + |
| 35 | +The parser uses Python's `tokenize` module to extract code structure from source files. |
| 36 | + |
| 37 | +**Key Classes:** |
| 38 | +- `_Object` - Base class for all parsed objects (lineno, endno, name, parent) |
| 39 | +- `Function` - Represents a function definition |
| 40 | +- `AsyncFunction` - Represents an async function definition |
| 41 | +- `Class` - Represents a class definition with methods |
| 42 | +- `Import` - Collects all imports from a module |
| 43 | + |
| 44 | +**Main Function:** |
| 45 | +- `create_objects_array(fname, source)` - Parses source code and returns list of objects |
| 46 | + |
| 47 | +**Import Handling:** |
| 48 | +- Simple imports: `import os` → `['os']` |
| 49 | +- From imports: `from os import path` → `['os.path']` |
| 50 | +- Comma-separated: `from pkg import a, b, c` → `['pkg.a', 'pkg.b', 'pkg.c']` |
| 51 | +- Aliased imports: `from pkg import mod as m` → `['pkg.mod as m']` |
| 52 | + |
| 53 | +### 2. Core (`codegraph/core.py`) |
| 54 | + |
| 55 | +The core module builds the dependency graph from parsed data. |
| 56 | + |
| 57 | +**Key Classes:** |
| 58 | +- `CodeGraph` - Main class that orchestrates graph building |
| 59 | + |
| 60 | +**Key Functions:** |
| 61 | +- `get_code_objects(paths_list)` - Parse all files and return dict of module → objects |
| 62 | +- `get_imports_and_entities_lines()` - Extract imports and entity line ranges |
| 63 | +- `collect_entities_usage_in_modules()` - Find where entities are used |
| 64 | +- `search_entity_usage()` - Check if entity is used in a line |
| 65 | + |
| 66 | +**Data Flow:** |
| 67 | +``` |
| 68 | +Python Files → Parser → Code Objects → Import Analysis → Entity Usage → Dependency Graph |
| 69 | +``` |
| 70 | + |
| 71 | +**Graph Format:** |
| 72 | +```python |
| 73 | +{ |
| 74 | + "/path/to/module.py": { |
| 75 | + "function_name": ["other_module.func1", "local_func"], |
| 76 | + "class_name": ["dependency1"], |
| 77 | + } |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +### 3. Visualizer (`codegraph/vizualyzer.py`) |
| 82 | + |
| 83 | +Provides two visualization modes: D3.js (default) and matplotlib (legacy). |
| 84 | + |
| 85 | +**D3.js Visualization:** |
| 86 | +- `convert_to_d3_format()` - Converts graph to D3.js node/link format |
| 87 | +- `get_d3_html_template()` - Returns complete HTML with embedded D3.js |
| 88 | +- `draw_graph()` - Saves HTML and opens in browser |
| 89 | + |
| 90 | +**D3.js Features:** |
| 91 | +- Force-directed layout for automatic node positioning |
| 92 | +- Zoom/pan with mouse wheel and drag |
| 93 | +- Node dragging to reposition |
| 94 | +- Collapse/expand modules and entities |
| 95 | +- Search with autocomplete |
| 96 | +- Tooltips and statistics panel |
| 97 | + |
| 98 | +**Matplotlib Visualization:** |
| 99 | +- `draw_graph_matplotlib()` - Legacy visualization using networkx |
| 100 | +- `process_module_in_graph()` - Process single module into graph |
| 101 | + |
| 102 | +**D3.js Data Format:** |
| 103 | +```json |
| 104 | +{ |
| 105 | + "nodes": [ |
| 106 | + {"id": "module.py", "type": "module", "collapsed": false}, |
| 107 | + {"id": "module.py:func", "label": "func", "type": "entity", "parent": "module.py"} |
| 108 | + ], |
| 109 | + "links": [ |
| 110 | + {"source": "module.py", "target": "module.py:func", "type": "module-entity"}, |
| 111 | + {"source": "module.py:func", "target": "other.py:dep", "type": "dependency"} |
| 112 | + ] |
| 113 | +} |
| 114 | +``` |
| 115 | + |
| 116 | +### 4. CLI (`codegraph/main.py`) |
| 117 | + |
| 118 | +Click-based command-line interface. |
| 119 | + |
| 120 | +**Options:** |
| 121 | +- `paths` - Directory or file paths to analyze |
| 122 | +- `--matplotlib` - Use legacy matplotlib visualization |
| 123 | +- `--output` - Custom output path for HTML file |
| 124 | + |
| 125 | +### 5. Utilities (`codegraph/utils.py`) |
| 126 | + |
| 127 | +Helper functions for file system operations. |
| 128 | + |
| 129 | +**Key Functions:** |
| 130 | +- `get_python_paths_list(path)` - Recursively find all .py files |
| 131 | + |
| 132 | +## Data Flow |
| 133 | + |
| 134 | +``` |
| 135 | +1. CLI receives path(s) |
| 136 | + ↓ |
| 137 | +2. utils.get_python_paths_list() finds all .py files |
| 138 | + ↓ |
| 139 | +3. parser.create_objects_array() parses each file |
| 140 | + - Extracts functions, classes, methods |
| 141 | + - Collects import statements |
| 142 | + ↓ |
| 143 | +4. core.CodeGraph.usage_graph() builds dependency graph |
| 144 | + - Maps entities to line ranges |
| 145 | + - Finds entity usage in code |
| 146 | + - Creates dependency edges |
| 147 | + ↓ |
| 148 | +5. vizualyzer.draw_graph() creates visualization |
| 149 | + - Converts to D3.js format |
| 150 | + - Generates HTML with embedded JS |
| 151 | + - Opens in browser |
| 152 | +``` |
| 153 | + |
| 154 | +## Node Types |
| 155 | + |
| 156 | +| Type | Visual | Description | |
| 157 | +|------|--------|-------------| |
| 158 | +| Module | Green square | Python .py file | |
| 159 | +| Entity | Blue circle | Function or class | |
| 160 | +| External | Gray circle | Dependency from outside analyzed codebase | |
| 161 | + |
| 162 | +## Link Types |
| 163 | + |
| 164 | +| Type | Visual | Description | |
| 165 | +|------|--------|-------------| |
| 166 | +| module-entity | Green dashed | Module contains entity | |
| 167 | +| module-module | Orange solid | Module imports from module | |
| 168 | +| dependency | Red | Entity uses another entity | |
| 169 | + |
| 170 | +## Testing Strategy |
| 171 | + |
| 172 | +- **Unit tests**: Parser, import handling, utility functions |
| 173 | +- **Integration tests**: Full graph generation on test data |
| 174 | +- **Self-reference tests**: CodeGraph analyzing its own codebase |
| 175 | +- **Multi-version**: Python 3.9 - 3.13 via tox |
| 176 | + |
| 177 | +## Dependencies |
| 178 | + |
| 179 | +- **networkx**: Graph data structure (for matplotlib mode) |
| 180 | +- **matplotlib**: Legacy visualization |
| 181 | +- **click**: CLI framework |
| 182 | + |
| 183 | +## Extension Points |
| 184 | + |
| 185 | +1. **New visualizers**: Add functions to `vizualyzer.py` |
| 186 | +2. **New parsers**: Extend `parser.py` for other languages |
| 187 | +3. **New link types**: Add to `convert_to_d3_format()` |
| 188 | +4. **Export formats**: Add to `vizualyzer.py` (JSON, DOT, etc.) |
0 commit comments