Skip to content

project-maths-modelling-rm-rf created by GitHub Classroom

ACM40960/project-maths-modelling-rm-rf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Auto Doc Gen β€” Evidence-Grounded Technical Documentation from Any GitHub Repo

License: MIT Platform Built with Status

Paste a GitHub URL β†’ get a Word-ready, evidence-cited handover document.
Local app with retrieval-augmented generation (RAG), a judge for factuality/citations, Mermaid→image rendering, and one-click DOCX export.


✨ Features

  • One-click docs from a repo
    Clone, analyze, and generate an ordered handover: Objective & Scope β†’ Installation & Setup β†’ Technologies Used β†’ System Architecture β†’ API Key.

  • Evidence-grounded writing
    Dual FAISS indexes (Text + Code) and section-aware retrieval keep claims tied to real repo content.

  • Inline citations
    Substantive statements cite file:line–line (e.g., [app/imports.py:12–28]). If evidence is missing, we insert [Information not available in repository].

  • Quality gate (β€œLLM-as-judge”)
    A second model verifies factuality, citations, and missing-but-expected items; verdicts saved as JSON for audit.

  • Word-friendly diagrams The app automatically creates a Mermaid system architecture diagram, and all Mermaid blocks are rendered to PNG so diagrams show up correctly in DOCX.

  • Local-first
    Everything runs on your machine; only embeddings/LLM calls use your configured provider key.


πŸ—οΈ System Architecture

flowchart LR
  subgraph Ingestion_And_Indexing
    GH[GitHub Repo] --> CL[Clone Repo]
    CL --> PC[Parse and Chunk]
    PC --> EMB[Create Embeddings]
  end

  EMB --> R[Retrieve Context]

  subgraph Agent
    R --> W[Write]
    W --> J[Judge]
    J -- pass --> S[Save]
    J -- fail --> V[Revise]
    V --> W
  end

  S --> E[End]
  W --> D[Generate DOCX]
Loading

πŸ–ΌοΈ UI Preview

App UI – Auto Doc Gen


πŸš€ Quick Start

Requirements

  • Python 3.10+ (3.11 recommended)
  • Node.js 18+ (for the Electron UI)
  • Git
  • An embeddings/LLM API key (e.g., OPENAI_API_KEY)

1) Clone

git clone https://github.com/<your-org-or-user>/<your-repo>.git
cd <your-repo>

2) Python env + deps

# Windows
python -m venv project_view
project_view\Scripts\activate
pip install -r requirements.txt

# macOS/Linux
python3 -m venv project_view
source project_view/bin/activate
pip install -r requirements.txt

3) Configure secrets

Create app/.env:

OPENAI_API_KEY=YOUR_KEY_HERE
# Optional:
# OPENAI_BASE_URL=...
# GITHUB_TOKEN=...   # to access private repos or raise rate limits

4) UI deps (Electron) + Mermaid CLI

cd ui
npm install
# Mermaid CLI to render diagrams to images for Word:
npm install --save-dev @mermaid-js/mermaid-cli

5) Run the Desktop App

Ensure the UI uses your venv’s Python:

# Windows (PowerShell)
$env:PYTHON="$PWD\..\project_view\Scripts\python.exe"; npm start

# Windows (cmd)
set PYTHON=%cd%\..\project_view\Scripts\python.exe
npm start

# macOS/Linux
PYTHON="$PWD/../project_view/bin/python" npm start

Paste a GitHub URL, click Generate, watch logs, then Save the DOCX.


🧩 How It Works (High Level)

  1. Ingest β€” Clone the repo; collect README/docs and source code.
  2. Chunk
    • Text via paragraph/heading splits
    • Code via AST (functions/classes) β†’ precise file:line spans
  3. Index β€” Build two FAISS stores (Text and Code) with embeddings.
  4. Generate per section β€” Retrieve most relevant chunks β†’ LLM writes grounded prose with inline citations.
  5. Judge β€” Second LLM checks factuality, citations, and missing items; JSON verdicts saved to app/debug/.
  6. Assemble β€” Electron merges Markdown, renders Mermaid to PNG, adds a cover page (repo title), imposes your section order, then converts HTML β†’ DOCX.

Artifacts saved

  • app/docs/ β€” final Markdown per section
  • app/docs_index/ β€” FAISS stores (text_index/, code_index/)
  • app/debug/ β€” judge JSONs per section

πŸ“ Project Structure

<your-repo>/
β”œβ”€ app/
β”‚  β”œβ”€ main.py
β”‚  β”œβ”€ imports.py
β”‚  β”œβ”€ chunking.py
β”‚  β”œβ”€ graph.py
β”‚  β”œβ”€ save_to_vector_db.py
β”‚  β”œβ”€ sections.yaml
β”‚  β”œβ”€ .env                  # your API keys (not committed)
β”‚  β”œβ”€ docs/                 # generated sections (Markdown)
β”‚  β”œβ”€ docs_index/           # FAISS stores (text_index/, code_index/)
β”‚  └─ debug/                # judge JSONs and run logs
β”œβ”€ ui/
β”‚  β”œβ”€ index.html
β”‚  β”œβ”€ preload.js
β”‚  └─ main.js               # spawns Python, Mermaidβ†’PNG, DOCX export
β”œβ”€ requirements.txt
β”œβ”€ package.json (in /ui)
└─ LICENSE

🧷 Citations & Judge

  • Inline citations:
    ... reads env vars [app/imports.py:12–28].
  • Missing evidence:
    [Information not available in repository] (no guessing).
  • Judge JSON (per section):
    {
      "factual": true,
      "cites_ok": true,
      "hallucinated": false,
      "missing_but_expected": ["Specific environment variables..."],
      "score": 0.9,
      "notes": "..."
    }

Use these for quality gates (CI) or quick manual edits.


πŸ”§ Troubleshooting

  • ModuleNotFoundError: docx
    Install into the same venv used by Electron:
    project_view\Scripts\python.exe -m pip install python-docx

  • UnicodeEncodeError on Windows
    Ensure UTF-8: the UI already sets PYTHONUTF8=1 / PYTHONIOENCODING=utf-8.

  • Mermaid not rendered
    Install @mermaid-js/mermaid-cli and ensure Chromium is available.


πŸ›  Tech Stack

Desktop & Glue
Electron (Node + Chromium), html-to-docx, @mermaid-js/mermaid-cli

Python Pipeline
LangChain / LangGraph, FAISS, GitPython, Tiktoken, (optional) python-docx

Models
Your provider’s embeddings + LLM (configured in app/.env)


πŸ—ΊοΈ Roadmap

  • Human-in-the-Loop review UI (approve/revise sections)
  • Interactive Docs (RAG chat) over the indexed repo
  • Multilingual output (bilingual DOCX/PDF)
  • Delta docs (incremental re-runs on diffs)
  • CI integration with quality gates (fail on low judge score)
  • Richer sections (Testing, Data model, Security, Ops)
  • Env-var detector to auto-build .env.example
  • Offline/On-prem mode (local embeddings/LLM)
  • More diagrams (sequence/ER diagrams)

πŸ“š References (Background)

  • Naimi et al., Automating Software Documentation (2024) β€” diagram-centric (UML β†’ LLM) documentation.
  • Thota et al., AI-Driven Automated Software Documentation Generation (ICDSNS 2024) β€” model comparison for snippet-level codeβ†’text.

Our system differs by mining the entire repository with RAG + judge, packaging a Word-ready handover with rendered diagrams.


🀝 Contributing

  1. Fork β†’ create a feature branch β†’ commit β†’ open PR.
  2. Follow PEP 8 (Python) / standard JS style.
  3. Include/update docs and, if possible, a small test repo URL for validation.

πŸ“ License

This project is released under the MIT License. See LICENSE.


TL;DR: Paste a GitHub URL β†’ get a structured, evidence-cited DOCX handover. Local, reproducible, and audit-friendly.

About

project-maths-modelling-rm-rf created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •