Skip to content

joelscaila/codedocrag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodedocRag

codedocrag is a Spring Boot service that analyzes source code repositories and generates technical documentation and Mermaid diagrams using Retrieval-Augmented Generation (RAG).

It supports both uploaded ZIP archives and direct GitHub repository ingestion, then processes repository content through chunking, embeddings, vector search, retrieval, and LLM-based generation.

Key Features

  • ZIP repository ingestion
  • GitHub repository ingestion
  • Source code chunking and processing
  • Vector indexing with Qdrant
  • RAG-based documentation generation
  • Mermaid architecture and component diagram generation
  • AWS Bedrock Claude integration
  • Mock LLM mode for local development
  • Swagger / OpenAPI documentation

Architecture Overview

The high-level pipeline is:

Repository (ZIP or GitHub) -> Ingestion pipeline -> Code chunking -> Embeddings -> Vector storage (Qdrant) -> Retrieval -> LLM generation (Claude via Bedrock or Mock) -> Documentation / diagrams

At a high level, the service:

  1. Accepts a repository as a ZIP upload or GitHub URL.
  2. Extracts and scans supported source and text files.
  3. Splits files into chunks.
  4. Generates embeddings for those chunks.
  5. Stores vectors and metadata in Qdrant.
  6. Retrieves relevant chunks for a documentation or diagram request.
  7. Sends the retrieved context to either a mock provider or AWS Bedrock Claude.

Project Structure

src/main/java/com/codedocrag

config/        → Spring configuration (AWS, Qdrant, Bedrock)
common/        → shared exceptions and API responses
ingestion/     → repository ingestion logic
chunking/      → source code chunking
embedding/     → embedding generation
rag/           → retrieval and prompt building
llm/           → LLM providers (mock and Bedrock)
storage/       → local and S3 storage implementations

Example Flow

  1. Submit a GitHub repository URL.
  2. The repository is cloned and processed.
  3. Files are chunked and embedded.
  4. Vectors are stored in Qdrant.
  5. A user asks a documentation question.
  6. Relevant chunks are retrieved.
  7. The LLM generates documentation or diagrams.

API Endpoints

Ingestion

  • POST /api/ingest/upload Upload a ZIP archive containing a repository.

  • POST /api/ingest/github Ingest a GitHub repository directly from a URL.

Generation

  • POST /api/docs/generate Generate documentation from a query or produce repository-wide documentation when repoId is provided without a query.

  • POST /api/docs/diagram Generate a Mermaid diagram for an indexed repository.

Running Locally

Prerequisites

  • Java 21
  • Maven
  • Docker
  • git

Start Qdrant

docker run -p 6333:6333 qdrant/qdrant

Run the Application

mvn spring-boot:run

The API will be available at http://localhost:8080.

Mock Mode

For local development without a real LLM provider:

export APP_LLM_PROVIDER=mock
mvn spring-boot:run

In this mode, the application returns deterministic mock responses while keeping the full ingestion and retrieval flow active.

AWS Bedrock Mode

To use Claude through AWS Bedrock:

aws configure

export APP_LLM_PROVIDER=bedrock
export AWS_REGION=eu-south-2
export BEDROCK_MODEL_ID=eu.anthropic.claude-sonnet-4-6

Then start the application:

mvn spring-boot:run

The application uses the AWS SDK default credential chain, so credentials should be available through aws configure, environment variables, or an IAM role.

Swagger UI

Swagger UI is available at:

http://localhost:8080/swagger-ui.html

The OpenAPI JSON is available at:

http://localhost:8080/v3/api-docs

Example Requests

Ingest a GitHub repository:

curl -X POST http://localhost:8080/api/ingest/github \
  -H "Content-Type: application/json" \
  -d '{"repoUrl":"https://github.com/user/repository"}'

Generate repository-wide documentation:

curl -X POST http://localhost:8080/api/docs/generate \
  -H "Content-Type: application/json" \
  -d '{"repoId":"<repo-id>"}'

Generate a Mermaid diagram:

curl -X POST http://localhost:8080/api/docs/diagram \
  -H "Content-Type: application/json" \
  -d '{"repoId":"<repo-id>"}'

About

Analyze code repositories and generate documentation and architecture diagrams using RAG and AWS Bedrock.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors