codedocrag is a Spring Boot service that analyzes source code repositories and generates technical documentation and Mermaid diagrams using Retrieval-Augmented Generation (RAG).
It supports both uploaded ZIP archives and direct GitHub repository ingestion, then processes repository content through chunking, embeddings, vector search, retrieval, and LLM-based generation.
- ZIP repository ingestion
- GitHub repository ingestion
- Source code chunking and processing
- Vector indexing with Qdrant
- RAG-based documentation generation
- Mermaid architecture and component diagram generation
- AWS Bedrock Claude integration
- Mock LLM mode for local development
- Swagger / OpenAPI documentation
The high-level pipeline is:
Repository (ZIP or GitHub) -> Ingestion pipeline -> Code chunking -> Embeddings -> Vector storage (Qdrant) -> Retrieval -> LLM generation (Claude via Bedrock or Mock) -> Documentation / diagrams
At a high level, the service:
- Accepts a repository as a ZIP upload or GitHub URL.
- Extracts and scans supported source and text files.
- Splits files into chunks.
- Generates embeddings for those chunks.
- Stores vectors and metadata in Qdrant.
- Retrieves relevant chunks for a documentation or diagram request.
- Sends the retrieved context to either a mock provider or AWS Bedrock Claude.
src/main/java/com/codedocrag
config/ → Spring configuration (AWS, Qdrant, Bedrock)
common/ → shared exceptions and API responses
ingestion/ → repository ingestion logic
chunking/ → source code chunking
embedding/ → embedding generation
rag/ → retrieval and prompt building
llm/ → LLM providers (mock and Bedrock)
storage/ → local and S3 storage implementations
- Submit a GitHub repository URL.
- The repository is cloned and processed.
- Files are chunked and embedded.
- Vectors are stored in Qdrant.
- A user asks a documentation question.
- Relevant chunks are retrieved.
- The LLM generates documentation or diagrams.
-
POST /api/ingest/uploadUpload a ZIP archive containing a repository. -
POST /api/ingest/githubIngest a GitHub repository directly from a URL.
-
POST /api/docs/generateGenerate documentation from a query or produce repository-wide documentation whenrepoIdis provided without a query. -
POST /api/docs/diagramGenerate a Mermaid diagram for an indexed repository.
- Java 21
- Maven
- Docker
git
docker run -p 6333:6333 qdrant/qdrantmvn spring-boot:runThe API will be available at http://localhost:8080.
For local development without a real LLM provider:
export APP_LLM_PROVIDER=mock
mvn spring-boot:runIn this mode, the application returns deterministic mock responses while keeping the full ingestion and retrieval flow active.
To use Claude through AWS Bedrock:
aws configure
export APP_LLM_PROVIDER=bedrock
export AWS_REGION=eu-south-2
export BEDROCK_MODEL_ID=eu.anthropic.claude-sonnet-4-6Then start the application:
mvn spring-boot:runThe application uses the AWS SDK default credential chain, so credentials should be available through aws configure, environment variables, or an IAM role.
Swagger UI is available at:
http://localhost:8080/swagger-ui.html
The OpenAPI JSON is available at:
http://localhost:8080/v3/api-docs
Ingest a GitHub repository:
curl -X POST http://localhost:8080/api/ingest/github \
-H "Content-Type: application/json" \
-d '{"repoUrl":"https://github.com/user/repository"}'Generate repository-wide documentation:
curl -X POST http://localhost:8080/api/docs/generate \
-H "Content-Type: application/json" \
-d '{"repoId":"<repo-id>"}'Generate a Mermaid diagram:
curl -X POST http://localhost:8080/api/docs/diagram \
-H "Content-Type: application/json" \
-d '{"repoId":"<repo-id>"}'