You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is a utility function used to display the transformed dataset. The color of each point refers to the actual digit (of course, this information was not used by the dimensionality reduction algorithm).
@@ -154,50 +156,56 @@ How do we choose the positions of the map points? We want to conserve the struct
This measures how close <spanclass="math-tex"data-type="tex">\\(x_j\\)</span> is from <spanclass="math-tex"data-type="tex">\\(x_i\\)</span>, considering a Gaussian distribution around <spanclass="math-tex"data-type="tex">\\(x_i\\)</span> with a given variance <spanclass="math-tex"data-type="tex">\\(\sigma_i^2\\)</span>. This variance is different for every point; it is chosen such that points in dense areas are given a smaller variance than points in sparse areas.
159
+
This measures how close <spanclass="math-tex"data-type="tex">\\(x_j\\)</span> is from <spanclass="math-tex"data-type="tex">\\(x_i\\)</span>, considering a **Gaussian distribution** around <spanclass="math-tex"data-type="tex">\\(x_i\\)</span> with a given variance <spanclass="math-tex"data-type="tex">\\(\sigma_i^2\\)</span>. This variance is different for every point; it is chosen such that points in dense areas are given a smaller variance than points in sparse areas. The original paper details how this variance is computed exactly.
158
160
159
161
Now, we define the similarity as a symmetrized version of the conditional similarity:
We now compute the similarity with a sigma depending on the data point (found via a binary search). This algorith is implemented in scikit-learn's`_joint_probabilities` function.
193
+
We now compute the similarity with a <spanclass="math-tex"data-type="tex">\\(\sigma_i\\)</span> depending on the data point (found via a binary search, according to the original t-SNE paper). This algorithm is implemented in the`_joint_probabilities`private function in scikit-learn's code.
We already observe the 10 groups in the data, corresponding to the 10 numbers.
233
+
We can already observe the 10 groups in the data, corresponding to the 10 numbers.
225
234
226
235
Let's also define a similarity matrix for our map points.
227
236
@@ -237,17 +246,22 @@ Let's assume that our map points are all connected with springs. The stiffness o
237
246
238
247
The final mapping is obtained when the equilibrium is reached.
239
248
249
+
Here is an illustration of a dynamic graph layout based on a similar idea. Nodes are connected via springs and the system evolves according to law of physics (example by [Mike Bostock](http://bl.ocks.org/mbostock/4062045)).
Remarkably, this analogy stems exactly from a natural mathematical algorithm. It corresponds to minimizing the [Kullback-Leiber](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) divergence between the two distributions <spanclass="math-tex"data-type="tex">\\(\big(p_{ij}\big)\\)</span> and <spanclass="math-tex"data-type="tex">\\(\big(q_{ij}\big)\\)</span>:
256
+
Remarkably, this physical analogy stems naturally from the mathematical algorithm. It corresponds to minimizing the [Kullback-Leiber](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) divergence between the two distributions <spanclass="math-tex"data-type="tex">\\(\big(p_{ij}\big)\\)</span> and <spanclass="math-tex"data-type="tex">\\(\big(q_{ij}\big)\\)</span>:
Here, <spanclass="math-tex"data-type="tex">\\(u_{ij}\\)</span> is a unit vector going from <spanclass="math-tex"data-type="tex">\\(y_j\\)</span> to <spanclass="math-tex"data-type="tex">\\(y_i\\)</span>. This gradient expresses the sum of all spring forces applied to map point <spanclass="math-tex"data-type="tex">\\(i\\)</span>.
We can observe the different phases of the optimization. The details of the algorithm can be found in the original paper.
358
+
We can clearly observe the different phases of the optimization, as described in the original paper.
345
359
346
360
Let's also create an animation of the similarity matrix of the map points. We'll observe that it's getting closer and closer to the similarity matrix of the data points.
@@ -401,8 +415,11 @@ for i, D in enumerate((2, 5, 10)):
401
415
ax.hist(norm(points, axis=1),
402
416
bins=np.linspace(0., 1., 50))
403
417
ax.set_title('D=%d' % D, loc='left')
418
+
plt.savefig('images/spheres.png', dpi=100)
404
419
</pre>
405
420
421
+

422
+
406
423
When reducing the dimensionality of a dataset, if we used the same Gaussian distribution for the data points and the map points, we could get an _imbalance_ among the neighbors of a given point. This imbalance would lead to an excess of attraction forces and a sometimes unappealing mapping. This is actually what happens in the original SNE algorithm, by Hinton and Roweis (2002).
407
424
408
425
The t-SNE algorithm works around this problem by using a t-Student with one degree of freedom (or Cauchy) distribution for the map points. This distribution has a much heavier tail than the Gaussian distribution, which _compensates_ the original imbalance. For a given data similarity between two data points, the two corresponding map points will need to be much further apart in order for their similarity to match the data similarity. This is can be seen in the following plot.
@@ -415,9 +432,12 @@ gauss = np.exp(-z**2)
415
432
cauchy = 1/(1+z**2)
416
433
plt.plot(z, gauss, label='Gaussian distribution')
417
434
plt.plot(z, cauchy, label='Cauchy distribution')
418
-
plt.legend();
435
+
plt.legend()
436
+
plt.savefig('images/distributions.png', dpi=120)
419
437
</pre>
420
438
439
+

440
+
421
441
## Conclusion
422
442
423
443
The t-SNE algorithm provides an effective method to visualize a complex dataset. It successfully uncovers hidden structures in the data, exposing natural clusters and smooth nonlinear variations along the dimensions. It has been implemented in many languages, including Python, and it can be easily used thanks to the scikit-learn library.
0 commit comments