|
185 | 185 | "First, we'll compute a `movie_profile` for each user, which will be a 671-dimensional vector that combines the ratings we received as input with the ratings from the rest of the users. We do this by scaling each column of the matrix by the user's rating of that movie and then adding together all of the columns. For example, if the user rated Inside Out as a 4, Frozen 2 as a 3, and didn't rate any other movies, their profile would be:\n",
|
186 | 186 | "\n",
|
187 | 187 | "```\n",
|
188 |
| - " Inside Out Frozen 2 movie_profile\n", |
189 |
| - "Parth 0.680 * 4 + 0.589 * 3 = 4.487\n", |
190 |
| - "Michael 0 * 4 + 0.147 * 3 = 0.441\n", |
191 |
| - "Joy 0.272 * 4 + 0.294 * 3 = 1.97\n", |
192 |
| - "Unicornelius 0.680 * 4 + 0.737 * 3 = 4.931\n", |
| 188 | + " Inside Out Frozen 2\n", |
| 189 | + "Parth 0.680 * 4 + 0.589 * 3 = 4.487\n", |
| 190 | + "Michael 0 * 4 + 0.147 * 3 = 0.441\n", |
| 191 | + "Joy 0.272 * 4 + 0.294 * 3 = 1.97\n", |
| 192 | + "Unicornelius 0.680 * 4 + 0.737 * 3 = 4.931\n", |
193 | 193 | "```\n",
|
194 | 194 | "\n",
|
195 |
| - "Then, we'll normalize the movie profile by dividing it by its norm. In the above example, the norm of the vector is $6.971$, so the new normalized vector is:\n", |
| 195 | + "Then, we'll normalize that vector by dividing it by its norm. In the above example, the norm of the vector is $6.971$, so the new normalized vector is:\n", |
196 | 196 | "```\n",
|
197 | 197 | " Inside Out Frozen 2 movie_profile\n",
|
198 | 198 | "Parth 0.680 * 4 + 0.589 * 3 = 4.487 -> 0.644 \n",
|
|
203 | 203 | "\n",
|
204 | 204 | "Notice that this vector is the same size as each of the movie vectors (it'll have 671 entries)... That's because we can think of this vector as a vector which represents the *perfect movie* for this user.\n",
|
205 | 205 | "\n",
|
206 |
| - "The cosine similarity between two vectors $x = (x_1, x_2, \\dots, x_n)$ and $y = (y_1, y_2, \\dots, y_n)$ (which both have norm 1) is defined as their dot product, or the sum of element-wise products of their entries: $x_1 y_1 + x_2 y_2 + \\cdots + x_n y_n$. You can think of the cosine similarity as an estimation of the \"closeness\" between the two vectors.\n", |
| 206 | + "The cosine similarity between two vectors $x = (x_1, x_2, \\dots, x_n)$ and $y = (y_1, y_2, \\dots, y_n)$ (which both have norm 1) is defined as their dot product, or the sum of element-wise products of their entries: $x_1 y_1 + x_2 y_2 + \\cdots + x_n y_n$. This will be a number between 0 and 1 with higher values representing more similar vectors. You can think of the cosine similarity as an estimation of the \"closeness\" between the two vectors.\n", |
207 | 207 | "\n",
|
208 | 208 | "Find the movies that are closest to our `movie_profile`: compute the cosine similarity between the `movie_profile` and each of the columns in our matrix and return the indices of the top `n` movies, in order from most similar to least similar.\n",
|
209 | 209 | "\n",
|
|
246 | 246 | "\n",
|
247 | 247 | "Not bad! I haven't seen Ender's Game, so I guess that's on my list.\n",
|
248 | 248 | "\n",
|
249 |
| - "Take a look at [movies.txt](movies.txt) and add in your own ratings!" |
| 249 | + "Take a look at [movies.txt](movies.txt) and add in your own ratings!\n", |
| 250 | + "\n", |
| 251 | + "### So how does this work?\n", |
| 252 | + "Here's a fairly math-heavy explanation of how this is working. We're taking each movie and mapping it into 671-dimensional space, where each axis represents a different user's rating of that movie. We're assuming that each of those axes are orthoganal to one another (which, in reality, might not be a good assumption).\n", |
| 253 | + "\n", |
| 254 | + "Then, based on the inputted preferences, we're creating a new vector as a linear combination of the movie vectors that the inputted preferences have ranked. Then, we find which movies (vectors) are closest to that vector.\n", |
| 255 | + "\n", |
| 256 | + "This is called *cosine similarity* because of the standard formulation of the dot product. If $x = (x_1, x_2, \\dots, x_n)$ and $y = (y_1, y_2, \\dots, y_n)$, then:\n", |
| 257 | + "$$x_1 y_1 + x_2 y_2 + \\cdots + x_n y_n = x \\cdot y = \\lVert x \\rVert \\lVert y \\rVert \\cos(\\theta)$$\n", |
| 258 | + "\n", |
| 259 | + "Where $\\theta$ is the angle between the two vectors. Since our vectors have norm 1, this simplifies:\n", |
| 260 | + "$$x_1 y_1 + x_2 y_2 + \\cdots + x_n y_n = \\cos(\\theta)$$\n", |
| 261 | + "\n", |
| 262 | + "By the definition of $\\cos$, this will give the length of the perpendicular distance between $x$ and $y$, which is why we use it as a measure of similarity between $x$ and $y$.\n", |
| 263 | + "\n", |
| 264 | + "### OMG this is so cool... I want to do more!\n", |
| 265 | + "Great! Here are some ideas for extensions and, potentially, a final project:\n", |
| 266 | + "1. Implement the matrix completion algorithm using [Carlos Fernandez-Granda's notes on low-rank matrix completion](https://cims.nyu.edu/~cfgranda/pages/OBDA_spring16/material/low_rank_models.pdf).\n", |
| 267 | + "2. Perform this analysis on a more complex data set from [Kaggle](https://www.kaggle.com/) or another dataset website.\n", |
| 268 | + "3. Take into account the fact that different users have similar preferences, and don't treat the axes as orthogonal.\n", |
| 269 | + "4. Perform unsupervised learning on this data set and cluster similar movies together...\n", |
| 270 | + "5. ...using that data, develop a Buzzfeed-style quiz where each question is of the form \"Which movie do you prefer more?\" and, based on the results of that quiz, determine one cluster of movies that the quiz-taker prefers, and recommend all of the movies from that cluster to the user." |
250 | 271 | ]
|
251 | 272 | },
|
252 | 273 | {
|
|
0 commit comments