I played with organizing a collection of recipes into different topics (text clustering).
-
Method: Latent Dirichlet Allocation - LDA (topic modeling algorithm)
-
Output: LDA produced a list of topics (cluster of words) where each word in the cluster having a probability of occurrence for the given topic
A different approach would be to group recipes into different clusters based on some suitable similarity measure.
-
Method: KMeans clustering using TF-IDF (term frequency-inverse document frequency) weights
-
Output: every recipe showing up in one of the clusters.
- extract_recipes.py - randomly extract 100 recipes using spoonacular API (save recipes in a json file)
- text_utils.py - Preprocessing functions for text data
- LDA_text_model.py (class) - LDA preprocesses and algorithm
- load_recipes.py - apply LDA and Kmeans in those recipes