Skip to content

Commit 30c0017

Browse files
authored
docs creation example (#49)
* computes cyclomatic complexity * . * . * Automated pre-commit update * Remove unnecessary files from codegen * Automated pre-commit update * . * . * Automated pre-commit update --------- Co-authored-by: jayhack <2548876+jayhack@users.noreply.github.com>
1 parent a7289f1 commit 30c0017

File tree

2 files changed

+203
-0
lines changed

2 files changed

+203
-0
lines changed

examples/document_functions/README.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Automated Function Documentation Generator
2+
3+
This example demonstrates how to use Codegen to automatically generate comprehensive docstrings for functions by analyzing their dependencies and usage patterns within a codebase.
4+
5+
## Overview
6+
7+
The script uses Codegen's symbol analysis capabilities to:
8+
1. Identify functions without docstrings
9+
2. Analyze their dependencies and usages up to N degrees deep
10+
3. Generate contextually aware docstrings using AI
11+
12+
## Key Features
13+
14+
### Recursive Context Collection
15+
The script recursively collects both dependencies and usages to provide comprehensive context for docstring generation:
16+
17+
```python
18+
def get_extended_context(symbol: Symbol, degree: int) -> tuple[set[Symbol], set[Symbol]]:
19+
"""Recursively collect dependencies and usages up to the specified degree."""
20+
dependencies = set()
21+
usages = set()
22+
23+
if degree > 0:
24+
for dep in symbol.dependencies:
25+
if isinstance(dep, Import):
26+
dep = hop_through_imports(dep)
27+
if isinstance(dep, Symbol):
28+
dependencies.add(dep)
29+
# Recursively collect nested context
30+
dep_deps, dep_usages = get_extended_context(dep, degree - 1)
31+
dependencies.update(dep_deps)
32+
usages.update(dep_usages)
33+
```
34+
35+
### Import Resolution
36+
The script intelligently resolves imports to find the actual symbol definitions:
37+
38+
```python
39+
def hop_through_imports(imp: Import) -> Symbol | ExternalModule:
40+
"""Finds the root symbol for an import"""
41+
if isinstance(imp.imported_symbol, Import):
42+
return hop_through_imports(imp.imported_symbol)
43+
return imp.imported_symbol
44+
```
45+
46+
## Usage
47+
48+
1. Run the script on a target repository:
49+
```python
50+
codebase = Codebase.from_repo("your/repo", commit="commit_hash")
51+
run(codebase)
52+
```
53+
54+
2. The script will:
55+
- Process each function in the codebase
56+
- Skip functions that already have docstrings
57+
- Generate contextually aware docstrings for undocumented functions
58+
- Commit changes incrementally for safe early termination
59+
60+
## Example Output
61+
62+
The script provides detailed progress information:
63+
```
64+
[1/150] Skipping my_function - already has docstring
65+
[2/150] Generating docstring for process_data at src/utils.py
66+
✓ Generated docstring
67+
[3/150] Generating docstring for validate_input at src/validation.py
68+
✗ Failed to generate docstring
69+
```
70+
71+
## Features
72+
73+
- **Intelligent Context Collection**: Analyzes both dependencies and usages to understand function purpose
74+
- **Import Resolution**: Follows import chains to find actual symbol definitions
75+
- **Incremental Commits**: Saves progress after each function for safe interruption
76+
- **Progress Tracking**: Detailed logging of processing status
77+
- **Existing Docstring Preservation**: Skips functions that are already documented
78+
79+
## Use Cases
80+
81+
- Documenting legacy codebases
82+
- Maintaining documentation standards in large projects
83+
- Onboarding new team members with better code documentation
84+
- Preparing codebases for public release

examples/document_functions/run.py

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
import codegen
2+
from codegen import Codebase
3+
from codegen.sdk.core.external_module import ExternalModule
4+
from codegen.sdk.core.import_resolution import Import
5+
from codegen.sdk.core.symbol import Symbol
6+
7+
8+
def hop_through_imports(imp: Import) -> Symbol | ExternalModule:
9+
"""Finds the root symbol for an import"""
10+
if isinstance(imp.imported_symbol, Import):
11+
return hop_through_imports(imp.imported_symbol)
12+
return imp.imported_symbol
13+
14+
15+
def get_extended_context(symbol: Symbol, degree: int) -> tuple[set[Symbol], set[Symbol]]:
16+
"""Recursively collect dependencies and usages up to the specified degree.
17+
18+
Args:
19+
symbol: The symbol to collect context for
20+
degree: How many levels deep to collect dependencies and usages
21+
22+
Returns:
23+
A tuple of (dependencies, usages) where each is a set of related Symbol objects
24+
"""
25+
dependencies = set()
26+
usages = set()
27+
28+
if degree > 0:
29+
# Collect direct dependencies
30+
for dep in symbol.dependencies:
31+
# Hop through imports to find the root symbol
32+
if isinstance(dep, Import):
33+
dep = hop_through_imports(dep)
34+
35+
if isinstance(dep, Symbol) and dep not in dependencies:
36+
dependencies.add(dep)
37+
dep_deps, dep_usages = get_extended_context(dep, degree - 1)
38+
dependencies.update(dep_deps)
39+
usages.update(dep_usages)
40+
41+
# Collect usages in the current symbol
42+
for usage in symbol.usages:
43+
usage_symbol = usage.usage_symbol
44+
# Hop through imports for usage symbols too
45+
if isinstance(usage_symbol, Import):
46+
usage_symbol = hop_through_imports(usage_symbol)
47+
48+
if isinstance(usage_symbol, Symbol) and usage_symbol not in usages:
49+
usages.add(usage_symbol)
50+
usage_deps, usage_usages = get_extended_context(usage_symbol, degree - 1)
51+
dependencies.update(usage_deps)
52+
usages.update(usage_usages)
53+
54+
return dependencies, usages
55+
56+
57+
@codegen.function("document-functions")
58+
def run(codebase: Codebase):
59+
# Define the maximum degree of dependencies and usages to consider for context
60+
N_DEGREE = 2
61+
62+
# Filter out test and tutorial functions first
63+
functions = [f for f in codebase.functions if not any(pattern in f.name.lower() for pattern in ["test", "tutorial"]) and not any(pattern in f.filepath.lower() for pattern in ["test", "tutorial"])]
64+
65+
# Track progress for user feedback
66+
total_functions = len(functions)
67+
processed = 0
68+
69+
print(f"Found {total_functions} functions to process (excluding tests and tutorials)")
70+
71+
for function in functions:
72+
processed += 1
73+
74+
# Skip if already has docstring
75+
if function.docstring:
76+
print(f"[{processed}/{total_functions}] Skipping {function.name} - already has docstring")
77+
continue
78+
79+
print(f"[{processed}/{total_functions}] Generating docstring for {function.name} at {function.filepath}")
80+
81+
# Collect context using N-degree dependencies and usages
82+
dependencies, usages = get_extended_context(function, N_DEGREE)
83+
84+
# Generate a docstring using the AI with the context
85+
docstring = codebase.ai(
86+
"""
87+
Generate a docstring for this function using the provided context.
88+
The context includes:
89+
- dependencies: other symbols this function depends on
90+
- usages: other symbols that use this function
91+
""",
92+
target=function,
93+
# `codebase.ai` is smart about stringifying symbols
94+
context={"dependencies": list(dependencies), "usages": list(usages)},
95+
)
96+
97+
# Set the generated docstring for the function
98+
if docstring:
99+
function.set_docstring(docstring)
100+
print(" ✓ Generated docstring")
101+
else:
102+
print(" ✗ Failed to generate docstring")
103+
104+
# Commit after each function so work is saved incrementally
105+
# This allows for:
106+
# 1. Safe early termination - progress won't be lost
107+
# 2. Immediate feedback - can check results while running
108+
# 3. Smaller atomic changes - easier to review/revert if needed
109+
codebase.commit()
110+
111+
print(f"\nCompleted processing {total_functions} functions")
112+
113+
114+
if __name__ == "__main__":
115+
print("Parsing codebase...")
116+
codebase = Codebase.from_repo("fastapi/fastapi", commit="887270ff8a54bb58c406b0651678a27589793d2f")
117+
118+
print("Running function...")
119+
run(codebase)

0 commit comments

Comments
 (0)