GitFindr is an advanced search tool that enhances repository discovery by combining BM25-based ranking with semantic search using transformer-based embeddings. It refines search results using repository statistics like ⭐ stars, 🍴 forks, and 👀 clicks, providing highly relevant rankings.
GitHub's built-in search often fails to provide the most relevant repositories due to its reliance on basic keyword matching. Many great projects remain undiscovered because:
- README files and descriptions aren’t fully analyzed, causing keyword-dependent searches.
- Synonyms aren’t considered, meaning searches miss related terms.
- Ranking is inconsistent, prioritizing older or more forked repositories regardless of recent relevance.
GitFindr fixes these issues by introducing README scanning and semantic matching, ensuring that even loosely related terms surface the right repositories. By leveraging a hybrid BM25 + embedding search, GitFindr delivers more precise and meaningful results.
GitFindr is now built fully in Python, utilizing FastAPI, PostgreSQL with pgvector, and sentence-transformers for semantic indexing.
-
Embedding-Based Semantic Search:
- We use
sentence-transformers/all-MiniLM-L6-v2to generate vector embeddings for repository README files and metadata. - These embeddings are stored in PostgreSQL using the
pgvectorextension. - During a query, the user's input is also converted into an embedding vector.
- We compute cosine similarity between the query embedding and repository embeddings to retrieve semantically similar results.
- This allows us to match relevant repositories even when there are no exact keyword overlaps.
- We use
-
BM25-Based Search:
- Traditional keyword-based indexing is performed using PostgreSQL's full-text search engine.
- We calculate BM25 scores for each document using weighted fields such as repository name, description, and README content.
-
Hybrid Ranking with RRF (Reciprocal Rank Fusion):
-
Both BM25 and semantic search generate separate ranked lists of repository results.
-
We apply Reciprocal Rank Fusion (RRF) to combine the two rankings:
$$ RRF(r) = \sum_{i=1}^{n} \frac{1}{k + rank_i(r)} $$ rank_i(r)is the rank of repositoryrin the i-th ranked listkis a tunable constant (e.g., 60)
-
This method rewards results that appear in both rankings, regardless of exact rank, balancing precision and recall effectively.
-
This combined approach ensures both exact keyword matches and semantically similar results are surfaced accurately.
GitFindr still enhances BM25 ranking by incorporating interaction metrics. Below is the BM25S formula used:
Where:
-
📖 Inverse Document Frequency (IDF):
$$ IDF(q_i) = \log \left( \frac{N - df_i + 0.5}{df_i + 0.5} + 1 \right) $$
-
N= Total number of documents (repositories) -
df_i= Number of documents containing termi
-
-
📈 Term Frequency Weighting:
$$ f(q_i, D) = \sum_{b} v_b \cdot qd_i^b $$
-
v_b= Frequency weight of fieldb -
qd_i^b= Total occurrences ofq_iin fieldbof documentD
-
-
⚖️ Scaling Factor (
K):$$ K = k_1 \cdot \frac{\text{avg term freq in dataset}}{\text{avg term freq in dataset after weighting}} $$
-
k_1is a tunable parameter (k_1 ∈ [1.2, 2.0])
-
-
📊 Additional Weighting (
alt):$$ alt = (1 + \sum_{i} \alpha_i \log (1 + x_i)) $$
-
x_irepresents repository statistics (⭐ stars, 🍴 forks, 👀 clicks) -
α_iis a tuning constant
-
GitFindr consists of two main folders:
- Frontend: The UI for searching repositories.
- Backend: Handles indexing, searching, and processing.
-
Navigate to the frontend directory:
cd frontend -
Install dependencies:
npm install
-
Start the frontend:
npm run dev
-
Start PostgreSQL with pgvector support using Docker
Make sure Docker is installed. From the
backend/directory, run:docker compose up -d
This will start a PostgreSQL instance with
pgvectorextension enabled. -
Install
uvfor dependency managementFollow the installation guide here: https://docs.astral.sh/uv/
-
Install dependencies and start the backend
Run the following commands from the
backend/folder:make make run
This will install Python dependencies and launch the FastAPI server.
We welcome contributions! Please follow these steps:
- Fork the repository.
- Create a feature branch (
git checkout -b feature-branch). - Commit changes (
git commit -m 'Add new feature'). - Push to your branch (
git push origin feature-branch). - Open a pull request.
GitFindr envisions a community where people can discover and submit their ideas, ensuring that no idea gets buried and every project gets a fair chance to be seen. 🚀