The purpose of this project is to demonstrate how to work with your existing database by implementing a semantic search using the FAISS vector database. GPT-Your-Data utilizes FastAPI, SQLAlchemy, FAISS, and OpenAI's GPT model to create and manage a database of Pokémon episodes, incorporating a vector-based semantic search functionality.
Please take a look at my article on Medium about this project.
|
- Vector Search with FAISS: Use the FAISS service to index and search Pokémon episodes based on vector representations of episode descriptions.
- GPT Integration: Generate contextual responses to queries using the GPT model, integrating semantic search results from FAISS.
- Web Scraping: Automatically extract content from Pokémon episodes from online sources and store it in the database.
- REST API with FastAPI: Expose endpoints to create new episodes, search episodes by semantic similarity, and integrate with GPT to generate answers to questions based on episode content.
-
Clone the repository:
git clone https://github.com/breim/gpt-your-data.git cd gpt-your-data
-
Install dependencies using Poetry:
Make sure you have Poetry installed on your machine.
poetry install
-
Activate Poetry's virtual environment:
poetry shell
pip install -U sentence-transformers
The project uses SQLite by default. The session.py
file configures the database connection. To initialize the database:
python -m gpt_your_data.db.init_db
This will create the necessary tables in your SQLite database.
To start the FastAPI application, use:
poetry run start
The application will be available at http://localhost:8001
.
- Endpoint:
POST /episodes/
- Parameters:
name
: The name or identifier of the episode.description
: A detailed description of the episode.
- Endpoint:
GET /search/
- Parameters:
query
: A text query to search for similar episodes.
The project includes a web scraping script to extract Pokémon episode content:
python -m gpt_your_data.scripts.scrape_episodes
This script will fetch and store the content of the episodes in the database.
This project is licensed under the MIT License - see the LICENSE file for details.
Footnotes
-
Image source: Advanced RAG Techniques ↩