This project demonstrates a movie recommendation system using MongoDB and Hugging Face for semantic search. It generates embeddings for movie plots and allows querying the database to find movies with similar plot descriptions.
- Connects to a MongoDB database to store and retrieve movie data.
- Uses Hugging Face API to generate embeddings for movie plots.
- Implements semantic search using MongoDB's vector search capabilities.
- Python 3.8 or higher
- MongoDB with vector search enabled
- Hugging Face account for API access
-
Clone the repository:
git clone https://github.com/anaitabd/RAG.git cd RAG -
Create a
.envfile to store environment variables:MONGODB_USER=your_mongodb_username MONGODB_PASSWORD=your_mongodb_password HF_TOKEN=your_hugging_face_api_token
-
Install dependencies:
pip install pymongo requests python-dotenv
The script retrieves movie plots from the MongoDB collection and generates embeddings using the Hugging Face API. These embeddings are then stored in the database.
Run the script to generate embeddings:
python script_name.pyThe script allows querying the database with a description, e.g., "imaginary characters from outer space at war". The vector search feature in MongoDB retrieves movies with similar plots.
Example:
query = "imaginary characters from outer space at war"The system will return a list of matching movies, displaying the title and plot for each.
Movie Name: Star Wars,
Movie Plot: A long time ago in a galaxy far, far away...
Movie Name: Guardians of the Galaxy,
Movie Plot: A group of intergalactic criminals must pull together...
- Connects to a MongoDB cluster using credentials stored in the
.envfile. - Requires a MongoDB database with a collection named
movies.
- Generates embeddings for movie plots using the Hugging Face API.
- API URL:
https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2
- Performs vector-based semantic search on the
plot_embedding_hffield in the MongoDB collection.
-
503 Model Loading Error:
- The Hugging Face model may take time to load. The script retries embedding generation if the model is unavailable.
-
Connection Error:
- Ensure MongoDB credentials in the
.envfile are correct. - Check network connectivity and ensure MongoDB is accessible.
- Ensure MongoDB credentials in the
This project is licensed under the MIT License. See the LICENSE file for details.