This repository contains a Python-based auto-response bot that leverages the GitHub API and the open-source LlamaIndex package to automatically generate responses to discussions on GitHub. The bot monitors GitHub MOOSE repository discussions and provides the most relevant posts when a new discussion is initiated by users. We call it R.A.N.G.E.R. – "Responsive Assistant for Navigating and Guiding Engineering with Rigor"
The bot uses the GitHub API to fetch discussions from a MOOSE repository and store the data in a vector database. When a new discussion is initiated, the algorithm compares the discussion title with the content of all previous discussions (title + discussions) in the database and provides the most relevant posts to the user. The database is updated regularly to include all new posts, potentially on a monthly basis.
The repository includes the source code for discussion data parsing, vector database generation, relevant post suggestions, and unit test scripts.
This script fetches relevant information from the GitHub discussion forum using GraphQL queries.
Environment Variable Required:
GITHUB_TOKEN: The token should be granted sufficient access to read the repository.
Functionality:
- Automatically traverses through pagination to fetch information from each discussion post.
- Stores each page in JSON format, including the original question and comments.
Prerequisites:
query.gql.inGITHUB_TOKEN
This script embeds the relevant discussion information into a vector database using LlamaIndex functions.
Functionality:
- Uses
SimpleDirectoryReaderto read JSON data fromGitHubAPI.pyand save it as aDocumentobject. - Uses
HuggingFaceEmbeddingto load the embedding model. The default model is "all-MiniLM-L6-v2". - Uses
SemanticSplitterNodeParserto chunk content into nodes according to their semantic similarity. - Uses
VectorStoreIndexto generate the vector database and save it locally.
Prerequisites:
- Transformer model (default: all-MiniLM-L6-v2)
This script loads the vector database, generates the most relevant posts according to the title of a new post, and posts the result as a reply.
Functionality:
- Uses cosine similarity search to find the similarity between the new post's title and previous posts' titles and discussion contents.
- Adjustable parameters:
top_n(number of most relevant posts) andthreshold(similarity cutoff).
Prerequisites:
- Transformer model (default: all-MiniLM-L6-v2)
- Vector database (
/db_dir)
Note: It is recommended to use the same transformer model for vector database embedding and retrieval for best performance.
all-MiniLM-L12-v2is a sentence-transformer model used to embed content into a vector index. It maps sentences and paragraphs to a 384-dimensional dense vector space for tasks like clustering or semantic search.
To install and set up the moose-discussion-bot, follow these steps:
- Install Miniforge
- Create your environment:
conda create -n RANGER python pip conda activate RANGER
- Clone the repository:
git clone https://github.com/idaholab/moose-discussion-bot.git
- Navigate to the repository directory:
cd moose-discussion-bot - Install the required dependencies:
pip install -r requirements.txt
- Configure the bot by creating a
.envfile with the necessary environment variables:GITHUB_TOKEN=your_github_token REPO_OWNER=your_repo_owner REPO_NAME=your_repo_name
Separate unit tests are developed for each class in the repository using unittest. The tests are organized as follows:
test_GitHubAPI.py: Contains unit tests forGitHubAPI.py.test_IndexGenerator.py: Contains unit tests forIndexGenerator.py.test_GitHubBot.py: Contains unit tests forGitHubBot.py.
To run the tests:
pytestUse the validation subcommand to run a small, reproducible, offline check of the pipeline:
- Read a pin file (e.g.,
pinned.txt) that lists discussions to fetch (owner/repo#123or full discussion URLs). - Fetch those discussions into a raw folder (
--val-out-dir). - Build a fresh vector database (
--val-db). - Answer a one-off
--promptusing the offline index. - Optionally write and/or compare a golden result (
--write-golden,--golden).
Example
python RANGER.py --config config.yaml validation