Generate human-readable documentation from a codebase (GitHub repo) using an agentic workflow.
This repo supports two primary modes:
github-code: clone a GitHub repository (optionally checkout a commit) and generate docs into anoutputs/run folder.chat: lightweight RAG chat over a local folder using a FAISS vector store.
- Python 3.9.xx recommended
Install dependencies:
pip install -r requirements.txtSet provider API keys depending on the model you use:
- Groq:
GROQ_API_KEY - OpenAI:
OPENAI_API_KEY - Same for other
Example:
export GROQ_API_KEY="..."
export OPENAI_API_KEY="..."python3 main.py github-code \
--task-id mytask \
--clone-link https://github.com/<owner>/<repo>.git \
--commit-hash <optional_commit_hash> \
--setup-dir ./setup_dirs \
--output-dir ./outputs \
--model groq/openai/gpt-oss-120bNotes:
--setup-diris where repositories are cloned.--output-diris where run artifacts are written.--modelmust be one of the entries in the model hub (app/model/common.py).
Point --document-folder at a local project folder. The first run builds a FAISS vector store at:
<document-folder>/vector_store/<model_name>/
python3 main.py chat \
--document-folder /path/to/your/docs/folder \
--model groq/openai/gpt-oss-120bType your question and press Enter. Use exit / quit to stop.
Each run creates a timestamped folder under --output-dir:
outputs/
<task_id>_<YYYY-MM-DD_HH-MM-SS>/
info.log
cost.json
tool_call_sequence.json
agent_doc_raw_*.md
...
If docs look empty or off-format, check tool_call_sequence.json first (it shows whether code extraction/search tools succeeded).
RAG indexing uses file loader mapping and exclusions from app/globals.py:
ALLOW_FILESandLOADER_MAPPINGdecide which file types can be loaded.EXCLUDE_DIRSandEXCLUDE_FILESskip common noise (virtualenvs, git folders, lockfiles, etc).