In the course of this project, an ALS (Alternating Least Squares) recommendation model was trained utilizing the MLlib library and the MovieLens 100k dataset, which was stored on the Hadoop Distributed File System (HDFS). The objective of the model was to effectively leverage the MovieLens 100k dataset to generate insightful recommendations.
This project is inspired from the book Machine Learning with Spark
- Built a recommendation model using data about user preferences
- Used the trained model to compute recommendations for a given user as well compute similar items for a given item (that is, related items)
- Applied standard evaluation metrics to the model that we created to measure how well it performs in terms of predictive capability
The notebook Recommendation_system_with_Pyspark.ipynb
has a full description of each step of this project.