The Movie Recommendation System is a Streamlit-based web application that helps users discover movies they might enjoy.
It uses content-based filtering powered by machine learning to recommend movies similar to a user’s selection.
- Overview
- Project Workflow
- Business Problem
- Ingestion Script
- Tools & Technologies
- Project Structure
- Data Pipeline Overview
- App Preview
- Key Outcomes
- How to Run This Project
- License
The Movie Recommendation System is a Streamlit-based web application designed to help users discover movies they might enjoy.
It uses content-based filtering powered by machine learning to recommend movies similar to a user’s selection.
Utilized TF-IDF Vectorization and cosine similarity to match movie metadata for similarity scoring.
Deployed as a web application using Streamlit for interactive user experience.
✨ Key Features
- Personalized movie recommendations
- Movie posters for a visual preview
- Ratings and overviews
- Direct YouTube trailer links
- Clean, interactive Streamlit interface
- Load preprocessed movie metadata (
movie_dict.pkl) and similarity matrix (similarity.pkl). - User selects a movie from the dropdown.
- The system calculates the top 5 most similar movies.
- Fetch movie posters, ratings, and overviews using TMDb API.
- Display results in a visually appealing interface.
With thousands of movies released every year, users face information overload and often struggle to pick what to watch next.
This project solves that by providing personalized movie suggestions based on similarity, enhancing user experience, and boosting content discovery.
Here’s a simple example to generate your own movie_dict.pkl and similarity.pkl files using a movie dataset:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pickle
# Load your dataset
movies = pd.read_csv("movies.csv")
# Combine textual features into a single 'tags' column
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords']
# Convert text data to feature vectors
cv = CountVectorizer(max_features=5000, stop_words='english')
vectors = cv.fit_transform(movies['tags']).toarray()
# Compute similarity
similarity = cosine_similarity(vectors)
# Save the data
pickle.dump(movies.to_dict(), open('movie_dict.pkl', 'wb'))
pickle.dump(similarity, open('similarity.pkl', 'wb'))| Tool | Purpose |
|---|---|
| Python | Core programming language |
| Streamlit | Web app development |
| TMDb API | Fetches posters, ratings, and trailers |
| Pandas | Data manipulation |
| Pickle | Data serialization |
| Scikit-learn | Similarity computation |
Movie-Recommender-System/
│
├── app.py # Main Streamlit app
├── movie_dict.pkl # Movie metadata file
├── similarity.pkl # Precomputed similarity matrix
├── requirements.txt # Python dependencies
└── README.md # Project documentation
| Step | Description |
|---|---|
| 1. Data Collection | Gather movie metadata such as titles, genres, keywords, and overviews. |
| 2. Data Preprocessing | Clean and merge textual columns (overview, genres, keywords, etc.) to create a single feature column. |
| 3. Feature Extraction | Convert combined text data into numerical vectors using CountVectorizer. |
| 4. Similarity Calculation | Compute cosine similarity between movie vectors to identify similar movies. |
| 5. Deployment | Integrate the model with Streamlit UI and TMDb API for real-time movie recommendations. |
| Outcome | Description |
|---|---|
| End-to-End Web App | Developed a complete movie recommendation system from data preprocessing to deployment. |
| TMDb API Integration | Integrated the TMDb API to fetch real-time movie posters, ratings, and trailers. |
| Cosine Similarity Model | Implemented content-based filtering using cosine similarity for accurate recommendations. |
| Interactive Streamlit UI | Designed a user-friendly interface with dynamic elements for enhanced user experience. |
-
Install Python (version 3.7 or later).
-
Install required Python libraries:
pip install streamlit pandas requests
Setup⬇️
-
Prepare Data: Since
movie_dict.pklandsimilarity.pklare not provided, you need to generate them:- The
movie_dict.pklfile should contain movie metadata (e.g., movie IDs, titles, etc.). - The
similarity.pklfile should be a precomputed similarity matrix. - Use your dataset and appropriate Python libraries to create these files.
- The
-
Clone the Repository:
git clone https://github.com/your-username/Movie-Recommender-System.git cd Movie-Recommender-System -
Add the Required Files: Place the generated
movie_dict.pklandsimilarity.pklfiles in the project directory. -
Run the Application:
streamlit run app.py
API Integration⬇️
The app uses the TMDb API for fetching movie details. Replace your TMDb API key inside the code: `
`api_key = "YOUR_TMDB_API_KEY"
Contributions are welcome! Feel free to fork this repository, make improvements, and submit pull requests.
Together, let's make this recommendation system even more powerful and versatile.
This project is licensed under the MIT License 2025 Faisal Khan
If you like this project don’t forget to 🌟(star) the repository and Clone this repository.

