Author: David Pinto
2020-10-21
This project implements a recommender system for similar movies based on content and collaborative filtering embedding features.
Create a conda environment and install all required packages listed in the env_requirements.txt file.
# Create environment
conda create -n movie-similarity -y python=3.7
# Activate environment
conda activate movie-similarity
# Append conda-forge to the list of channels
conda config --append channels conda-forge
# Install dependencies
conda install -y --file env_requirements.txt
# Add environment to Jupyter
python -m ipykernel install --user --name=movie-similarity
numpyandpandasfor data cleaning, manipulation and transformation.scipyfor sparse matrices and correlation measures.unidecodeandnltkfor text manipulation.scikit-learnfor data normalization and text vectorization.vaexfor manipulation of large DataFrames.matplotlibandplotninefor data visualization.lightfmfor collaborative filtering with matrix factorization.faissfor fast Approximate Nearest Neighbors algorithms.
Take a look at the data/raw folder to get instructions on how to download the dataset.
The project is organized on Jupyter notebooks. Each notebook is self-contained and well documented:
- 1. Data Preparation.
- 2. Exploratory Analysis.
- 3. User Based Similarity.
- 4. Content Based Embedding.
- 5. Collaborative Fltering Embedding.
- 6. Similarity Match with ANN.
- 7. Performance Evaluation.
- 8. Hybrid Approach.
You can play with the movie embedding features using the Embedding Projector here. It can take a few seconds to start. But it will be worth it!
Take a look at the projector folder to see some results.
The project provides a Streamlit application to play with the movie recommender.
To run it locally:
make docker-build
make docker-runCongratulations! You have it running on 127.0.0.1:8501:
Choose an recommendation algorithm and a movie title to get recommendations of similar movies. I hope you enjoy it!

