CodeSherpa

AI-Powered GitHub Repository Onboarding Assistant

CodeSherpa is a prototype tool, that should help developers quickly understand and navigate unfamiliar github codebases. By leveraging large language models and semantic search, it provides personalized guidance and answers questions about repository structure, code patterns, and development history.

Features:

Repository Analysis: Import GitHub repositories and analyze code, commits, issues, and documentation
Onboarding Path Generation: Receive customized "getting started" recommendations and paths through the codebase
AI Chat Interface: Ask natural language questions about the codebase and receive contextual answers

Tech Stack:

Frontend: Next.js 14, React, Tailwind CSS, shadcn/ui for some complex components
Vector Search: ChromaDB for vector stores and semantic code search
AI/LLM: OpenAI API for intelligent responses
Embedding: Sentence Transformers for code embedding
Authentication: Supabase Auth
Bringing it together: Docker + Docker Compose

Current Deployment

As of 02/04/2025 the project is hosted on www.codesherpa.xyz, using two AWS EC2 instances. One is running the fullstack nextjs app, another is running the chromadb vector database. Instances are bound by a security group. Automatic GitHub workflow is setup to update the app on the server when changes are pushed to the main branch.

Why AWS? The initial plan was to setup a standalone ChromaDB running in the cloud, and to deploy a nextjs app on Vercel/Netlify/AWS Amplify like a normal human being would do. This approach proved unfeasible due to the fact that our repository import is a lengthy process, impossible to perform stable in a serverless environment. The final deployment stack is: git/ssh to update the project, pm2 to keep a nextjs app running, nginx to handle HTTPS traffic, and a security group to limit access to the instances. Also, a nextjs app requiring user auth is an extra layer of security in front of otherwise exposed ChromaDB.

Development

Prerequisites

Locally the app runs seamlessly in a docker environment, the compose files are already setup to handle database persistence between starts. You will need the following things to get started:

Docker and Docker Compose
Supabase account (for authentication)
OpenAI API key

Installation

Clone the repository:

    git clone https://github.com/yourusername/code-sherpa.git
    cd code-sherpa

Create a .env file using the env.example (run cp .env.example .env), and add your API keys to the .env file:

SUPABASE_URL=your_supabase_url_here
SUPABASE_ANON_KEY=your_supabase_anon_key_here
OPENAI_API_KEY=your_openai_api_key_here
CHROMA_DB_URL=http://chromadb:8000 // You can point this to an instance running in the cloud if you wish
CHROMA_RESULTS_NUMBER=5 // This is the number of search results returned to the AI chatbot

Supabase Setup

In addition to connection strings, it is of course necessary to create tables and triggers in the Supabase database. Currently the app is using supabase for authentication and user management, as well as all the repository and processed files metadata.

Authentification requires getting a Github access token from the user, so we must also create an OAuth app in GitHub, and connect it to your Supabase auth provider.

Currently a trigger function is set up to add user data into profile table when a new user is signed in via GitHub OAuth.

You Are Ready for Take Off

Start the application:

docker-compose up --build

Open your browser and navigate to http://localhost:3000

Usage

Import a Repository: Enter a GitHub repository URL to import and analyze. Explore the Codebase: Navigate through the repository structure and see key files. Ask Questions: Use the AI chat interface to ask about any aspect of the code. Get Onboarding Guidance: Receive personalized recommendations on where to start.

Some example questions:

"How does authentication work in this app?"
"What's the overall architecture of this project?"
"Explain the component structure of the frontend"
"Where should I start if I want to understand the data flow?"

Important Technical Notes

Node.js vs Bun: While this project was originally developed with Bun, I've switched to Node.js due to compatibility issues between Bun and Next.js server actions in Docker environments. When using Bun, server actions would freeze without completing or returning errors, making authentication and other features unusable. Good old Node.js resolves this issue and provides better stability in the Docker environment.

AWS deployment: The project is currently deployed on AWS EC2 instances, using a combination of git/ssh to update the project, pm2 to keep a nextjs app running, nginx to handle HTTPS traffic, and a security group to limit access to the instances.

Next Steps

Setup a monitoring system to track the health of the application in the EC2
Move away from Supabase to avoid vendor lock-in and overall getting charged in general
Improve onboarding generation process with more intelligent tool use and enriched data mined from the repository
Add a placeholder page to handle redeployment process without nginx showing an ugly 502
Improve branch security, PR rules, document issues in GitHub to open project to contributions
Add a LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
docs		docs
public		public
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.nextjs		Dockerfile.nextjs
README.md		README.md
components.json		components.json
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
middleware.ts		middleware.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeSherpa

AI-Powered GitHub Repository Onboarding Assistant

Features:

Tech Stack:

Current Deployment

Development

Prerequisites

Installation

Supabase Setup

You Are Ready for Take Off

Usage

Important Technical Notes

Next Steps

About

Uh oh!

Uh oh!

Languages

vladrepinskiy/codesherpa

Folders and files

Latest commit

History

Repository files navigation

CodeSherpa

AI-Powered GitHub Repository Onboarding Assistant

Features:

Tech Stack:

Current Deployment

Development

Prerequisites

Installation

Supabase Setup

You Are Ready for Take Off

Usage

Important Technical Notes

Next Steps

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages