Movie Similarity

Author: David Pinto
2020-10-21

This project implements a recommender system for similar movies based on content and collaborative filtering embedding features.

Documentation

Project Proposal (PDF): Proposal.
Project Report (PDF): Report.
Project Report (HTML): Website.

Setup

Create a conda environment and install all required packages listed in the env_requirements.txt file.

# Create environment
conda create -n movie-similarity -y python=3.7

# Activate environment
conda activate movie-similarity

# Append conda-forge to the list of channels
conda config --append channels conda-forge

# Install dependencies
conda install -y --file env_requirements.txt

# Add environment to Jupyter
python -m ipykernel install --user --name=movie-similarity

Required Pakages

numpy and pandas for data cleaning, manipulation and transformation.
scipy for sparse matrices and correlation measures.
unidecode and nltk for text manipulation.
scikit-learn for data normalization and text vectorization.
vaex for manipulation of large DataFrames.
matplotlib and plotnine for data visualization.
lightfm for collaborative filtering with matrix factorization.
faiss for fast Approximate Nearest Neighbors algorithms.

Dataset

Take a look at the data/raw folder to get instructions on how to download the dataset.

Notebooks

The project is organized on Jupyter notebooks. Each notebook is self-contained and well documented:

Embedding Visualization

You can play with the movie embedding features using the Embedding Projector here. It can take a few seconds to start. But it will be worth it!

Take a look at the projector folder to see some results.

Deploy Web Application

The project provides a Streamlit application to play with the movie recommender.

To run it locally:

make docker-build
make docker-run

Congratulations! You have it running on 127.0.0.1:8501:

Choose an recommendation algorithm and a movie title to get recommendations of similar movies. I hope you enjoy it!

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data/raw		data/raw
docs		docs
img		img
output		output
projector		projector
proposal		proposal
src		src
.gitignore		.gitignore
01-data-preparation.ipynb		01-data-preparation.ipynb
02-exploratory-analysis.ipynb		02-exploratory-analysis.ipynb
03-user-based-similarity.ipynb		03-user-based-similarity.ipynb
04-content-based-embedding.ipynb		04-content-based-embedding.ipynb
05-collaborative-filtering-embedding.ipynb		05-collaborative-filtering-embedding.ipynb
06-similarity-match-with-ann.ipynb		06-similarity-match-with-ann.ipynb
07-performance-evaluation.ipynb		07-performance-evaluation.ipynb
08-hybrid-approach.ipynb		08-hybrid-approach.ipynb
Dockerfile		Dockerfile
Makefile		Makefile
Procfile		Procfile
README.md		README.md
app.py		app.py
evn_requirements.txt		evn_requirements.txt
requirements.txt		requirements.txt
runtime.txt		runtime.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Movie Similarity

Documentation

Setup

Required Pakages

Dataset

Notebooks

Embedding Visualization

Deploy Web Application

About

Uh oh!

Languages

davpinto/ml-eng-project

Folders and files

Latest commit

History

Repository files navigation

Movie Similarity

Documentation

Setup

Required Pakages

Dataset

Notebooks

Embedding Visualization

Deploy Web Application

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages