callGraph.py — Static call-graph generator (Python rewrite of callGraph with some changes)
callGraph.py is a static call-graph extraction tool implemented in Python. It aims to discover function definitions and call relationships in source code using a best-effort combination of accurate parsing for Python (AST-based) and conservative regex heuristics for many other languages. The tool can emit DOT graphs, PNG/SVG/PDF images, and JSON/YAML representations of the extracted call graph.
- Accurate Python parsing using the
astmodule to avoid most false positives. - Heuristic (regex) parsing for many other languages (C/C++, Java, Rust, Go, TypeScript, JavaScript, PHP, Ruby, Perl, Shell, etc.).
- Export options:
- DOT (.dot)
- Raster/vector images (.png, .svg, .pdf) via:
- Python
graphvizpackage (if available) - System
dot(Graphviz) if installed - Fallback
networkx+matplotlib(pure Python)
- Python
- JSON/YAML call-graph dumps (
-jsnOut,-ymlOut) and reading via-jsnIn/-ymlIn.
- CLI options for selecting start nodes, ignoring functions, obfuscation, subset source export, and writing individual function sources to disk (tempdir).
- Environment-driven debugging (
DUMP_PARSE) to inspect intermediate parse structures.
git clone https://github.com/Kubabob/callGraph.py.git
uv sync
pip install -r requirements.txt
pipenv install -r requirements.txt
# create and activate a virtualenv (POSIX)
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
On Windows:
python -m venv .venv
.\\.venv\\Scripts\\activate
pip install -r requirements.txt
Run the script directly:
# parse a single Python file, generate graph and JSON
python callGraph.py test/example.py -language py -output example.png -jsnOut callGraph_py.json
# parse a single Rust file
python callGraph.py test/example.rs -language rs -output example.svg -jsnOut callGraph_rs.json
Common flags:
-language <lang>— force language (py,js,ts,rs,c,cpp,java, ...). Required when scanning directories.-start <regex>— start graph traversal from function(s) matching this regex.-ignore <regex>— ignore function names matching this regex.-output <file>— output filename. If extension is:.dot— emits only the DOT file..png,.svg,.pdf— attempts to render image (with fallback chain).- If omitted, a temporary directory will be created for output.
-noShow— do not attempt to display the rendered image.-fullPath— do not strip file path from node labels (show full path).-writeFunctions— write each discovered function body into separate files in a temp directory.-writeSubsetCode <file>— write a source file containing only the functions included in the final graph (preserves shebang if present).-jsnOut <file>— write JSON representation of the call graph to file.-jsnIn <file>— read JSON representation of call graph from file (skip parsing).-ymlOut <file>/-ymlIn <file>— same as JSON but YAML (requires PyYAML).-verbose— verbose output (includes best-effort variable/script analysis).-obfuscate— obfuscate function names in the emitted graph.--renderer— choose renderer:auto(default),python-graphviz,system-dot,networkx.
Examples:
# generate dot only:
python callGraph.py test/example.py -language py -output example.dot
# generate svg with preferred renderer:
python callGraph.py test/example.cpp -language cpp -output example.svg --renderer system-dot
# read a precomputed call-graph json and render it:
python callGraph.py -jsnIn callGraph.json -output callGraph.png
The parser contains heuristic regex patterns for many languages. The following languages/extensions are recognized or have dedicated heuristics:
- Python:
py(AST parsing — highest accuracy) - JavaScript:
js,jsx - TypeScript:
ts,tsx - C / C++:
c,cpp(basic heuristics — may miss signatures or templates) - Java:
java - Rust:
rs - Go:
go - Swift:
swift - PHP:
php - Ruby:
rb - Perl:
pl - Shell scripts:
sh,bash,zsh - Lua:
lua - Kotlin:
kt - Dart:
dart - Julia:
jl - R:
r - Objective-C-ish:
m(best-effort) - Scala / Scalding scripts:
sc - Pascal:
pas - Verilog-like:
v
Notes:
- The regex-based heuristics are intentionally conservative and meant as fallbacks. For best results with non-Python languages, consider using language-specific AST tools or instrumented analysis.
- When scanning directories you must specify
-languageto limit file selection (the tool currently requires that to avoid scanning many unrelated files).
callGraph.py selects a renderer using a preference chain — either auto or the --renderer you specify:
python-graphvizpackage (graphvizPython package):- Good integration but often still requires the Graphviz
dotexecutable on your system.
- Good integration but often still requires the Graphviz
system-dot(dotcommand from Graphviz):- Preferred for high-quality layout. If you want to use
--renderer system-dotensure Graphviz is installed anddotis onPATH.
- Preferred for high-quality layout. If you want to use
networkx+matplotlib:- Pure Python fallback; draws a reasonable graph and exports PNG/SVG/PDF. Install with
pip install networkx matplotlib.
- Pure Python fallback; draws a reasonable graph and exports PNG/SVG/PDF. Install with
If none of the renderers succeed, the DOT file will still be generated and reported to you so it can be rendered later.
- If the tool prints
ERROR: No call graph data to process:- Ensure
-languageis set appropriately (especially when passing directories). - Inspect JSON intermediate output with
-jsnOutto see what the call-graph construction received. - Run with
DUMP_PARSE=1environment variable set to dump parsing internals (function definitions, contents, and raw call counts). Example:
- Ensure
# Dump parse internals to stdout (POSIX)
DUMP_PARSE=1 python callGraph.py test/example.py -language py
# Or write JSON and inspect it
python callGraph.py test/example.py -language py -jsnOut /tmp/out.json
cat /tmp/out.json | jq .
-
When a language is mis-parsed:
- The heuristics may not cover your language idioms (macros, generics, complex signatures). Consider adding a targeted regex in
LANG_SYNTAXor (preferably) use a proper AST-based parser for that language. - For Python, the AST-based parsing is accurate for detecting
defandasync def. Module-level code is treated as__MAIN__.
- The heuristics may not cover your language idioms (macros, generics, complex signatures). Consider adding a targeted regex in
-
Rendering failures:
- If
--renderer system-dotfails, verify thedotbinary is available (which dot) and that callingdot -Vworks. - If the Python
graphvizpackage raises an exception, check whetherdotis present and accessible; the Python package often requires the systemdotto actually render. - The
networkxfallback requiresnetworkxandmatplotlib.
- If
-
If you see many spurious edges:
- This is most likely caused by the conservative rescans (content- and range-based heuristics) used for non-Python languages. You can inspect
parse.func_call,parse.func_definition, andparse.func_contentsviaDUMP_PARSE=1to understand which calls were discovered and why edges exist. - Consider filtering with
-ignore <regex>to drop common helper or library functions that pollute the graph.
- This is most likely caused by the conservative rescans (content- and range-based heuristics) used for non-Python languages. You can inspect
-
The parser builds a
ParseResultstructure with:shebang— optional shebang line.func_contents— map: function name -> file path -> function body text (as parsed).func_definition— map: function name -> file path -> first line number.func_call— nested map: caller function name -> file path -> called identifier -> count.
-
You can emit the final call-graph JSON with
-jsnOut, or read a precomputed one with-jsnIn. -
DUMP_PARSE=1prints the normalizedParseResultobject to stdout (useful for debugging regex heuristics and AST fallbacks).
- Regex rules are maintained in the
LANG_SYNTAXmapping incallGraph.py:- Keys:
functionDefinition,functionEnd,functionCall,comment,variable. - Each maps language identifiers to a regex string.
- Keys:
- Adding support/improved heuristics:
- Edit
LANG_SYNTAXto add or tune regexes. - For better precision (especially for JS/TS/TSX/JSX), consider integrating an AST-based parser (e.g.,
tree-sitter,esprima, or language-specific compilers) and replacing the regex heuristics for that language.
- Edit
- Generate graph and JSON for a Python example:
python callGraph.py test/example.py -language py -output example.png -jsnOut python_callGraph.json
- Scan a directory of Rust files (force language):
python callGraph.py test -language rs -output rust_graph.svg -jsnOut rust_callGraph.json
# Mandatory: -language when scanning directories
- Read a previously generated JSON and render:
python callGraph.py -jsnIn rust_callGraph.json -output rust_graph.png
- Regex-based parsing is heuristic and not a substitute for proper AST parsing. You will see varying accuracy depending on language complexity.
- For large codebases or languages with heavy macro/generic usage, the conservative rescanning may produce false positives or degrade quality. Use targeted filters (
-ignore) or consider integrating a language-accurate parser. - Directory scans require
-languageto avoid unintended matches. - Rendering quality depends on the chosen renderer and availability of system tools (
dot) and Python packages.
- To extend
LANG_SYNTAX, editcallGraph.pyand add/adjust regexes for the language key. - Add regression tests by placing example files in
test/and asserting parsing outputs or dot contents. - If you want, I can:
- Add a
--list-parsedCLI flag that prints the parsed functions per file. - Add
--no-rescanto disable conservative rescans for non-Python languages. - Add a
--list-renderersflag that reports which renderers are available on the host.
- Add a
