Skip to content

Source code for personalised github repository recommendation using keyword extraction and similarity index matching

Notifications You must be signed in to change notification settings

IngaleChinmay04/personalised-github-repo-recommendation

 
 

Repository files navigation

GitHub Repository Recommender System 📊🚀

Welcome to the GitHub Repository Recommender System! This project is designed to fetch data from GitHub repositories, preprocess it, and use various algorithms to recommend repositories to users based on their preferences. Below is a detailed guide on how to set up, run, and understand the project.

Table of Contents

  1. Introduction
  2. Features
  3. Getting Started
  4. Pipeline Overview

Introduction

This project aims to provide a robust recommender system for GitHub repositories. It involves fetching repository data, preprocessing the data, extracting relevant keywords, and generating recommendations based on similarity metrics.

Features

  • Data Fetching: Retrieve repository data, README content, and issues/labels from GitHub.
  • Data Preprocessing: Clean and preprocess the fetched data.
  • Keyword Extraction: Extract keywords using TF-IDF, LDA, and BERT.
  • Similarity Calculation: Compute similarity between user preferences and repository features.
  • Recommendations: Generate and display repository recommendations for users.

Getting Started

Prerequisites

Ensure you have the following installed:

  • Python 3.7+
  • Git
  • Virtual Environment (optional but recommended)

Installation

  1. Clone the repository:
    git clone https://github.com/your-username/github-recommender-system.git
    cd github-recommender-system
  2. Install the required python packages:
    pip install -r requirements.txt
    
  3. Add your github token to the enviroment:
    export GITHUB_TOKEN='your_github_token'
    

Pipeline Overview

Data Fetching

  1. Fetch Repository Data: Use fetch_repo_data.py to gather repository metadata, README content, languages, and topics.
  2. Fetch Issue Labels: Use fetch_issue_labels.py to scrape issue labels from repository pages.
  3. Fetch Trending Repositories: Use fetch_trending_repos.py to get trending repositories based on language and spoken language.
  4. Fetch Trending Metadata: Use fetch_trending_repos_metadata.py to gather metadata for trending repositories.
  5. Fetch Trending Issues Labels: Use fetch_trending_issues_labels.py to scrape issue labels for trending repositories.

Data Preprocessing and Keyword Extraction

  1. Preprocess Data: Clean and preprocess the README content and issues.
  2. Extract Keywords: Use TF-IDF, LDA, and BERT to extract relevant keywords from the README and issues.

Similarity Index Matching

  1. Vectorize Data: Transform the preprocessed data into vectors using TF-IDF.
  2. Compute Similarity: Calculate cosine similarity between user preferences and repository vectors.
  3. Generate Recommendations: Recommend repositories to users based on the highest similarity scores.

About

Source code for personalised github repository recommendation using keyword extraction and similarity index matching

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.1%
  • Python 1.9%