Implementation of t-SNE with C++
(including the random walk version).
Visualization of the t-SNE
process with Python
.
cmake 3.8.0
or later;python 3.8.0
or later;c++ compiler
must supportC++17
;numpy, imageio, matplotlib, torch, sklearn
for yourpython
is required;
TIPS: if you only build tsne
manually, you just need a c++ compiler
supporting C++17
and use the command g++ tsne.cpp -o tsne --std=c++17 -O3 -DNDEBUG
(if your compiler is g++
) to get the executable.
- If you build
tsne
withoutcmake
, you can useg++ tsne.cpp -o tsne --std=c++17 -O3 -DNDEBUG
command; - If your platform is
UNIK-like
, you can usebash start.sh
to build and run this code automatically, otherwise, you can execute commands as thestart.sh
does; - Make sure your working directory of every script and
tsne
iscode/
to make finding data with relative paths possible; - The execution of
tsne
takes a long duration, so you are supposed to wait patiently, or you can update alln
s, which means the number of samples, in*.py
to run on less data, the defaultn
is6000
; - I've added arguements for
start.sh
, and those aren
,epoch
,enableRandomWalk
andtotalSampleNum
in order.- For example,
bash start.sh 6000 1000
means6,000
samples and1,000
iterations without random walk, which is default (same withbash start.sh
);bash start.sh 6000 1000 1 60000
means6,000
samples will be showed (if you don't understrand this, you are supposed to read the random walk part of t-SNE),1,000
iterations, enable random walk and the number of total samples is6,0000
- If
enableRandomWalk
is0
(the default value), you should not settotalSampleNum
be not equal withn
.
- For example,
- There is a typo in the pseudo code of the origin paper. Of the gradient descent part, there should be a minus rather than a plus, which should be
$y_i^{t+1} = y_i^t - \eta \frac{\delta C}{\delta y_i} + \alpha(t) (y_i^t - y_i^{t - 1})$ ; - There are some output information when running t-SNE. The train process are seperated to
3
parts (50
iterations,200
iterations andepoch - 250
iterations), so the train process rates are for every part in sequence.
- There are many symmetric calculation in formulas with i and j, so we just need calculate half elements of a matrix, such as the code calculating joint probability
$p_{ij}$ of the input data:
for (int i = 0; i < n; i++) {
for (int j = i + 1; j < n; j++) {
p[i][j] = p[j][i] = max(1e-100, (output[i][j] + output[j][i]) / (2 * n)) * EXAGGERATION;
}
}
- Try not to copy from a matrix, such as the code of method
operator+=
, which update elements of a matrix directly:
template<class T>
void operator+=(vector<T> &a, const vector<T> &b)
{
for (int i = 0; i < b.size(); i++) { a[i] += b[i]; }
}
- Where is a
$\sigma_i$ , there is a$\frac{1}{2\sigma_i^2}$ . Therefore, we just need calculate$\frac{1}{2\sigma_i^2}$ rather than$\sigma_i$ , such as the code calculating the conditional probability$p_{j\vert i}$ of input data ((*doubleSigmaSquareInverse)[i]
means$\frac{1}{2\sigma_i^2}$ exactly):
double piSum = 0;
for (int i = 0; i < x.size(); i++) {
piSum = 0;
for (int j = 0; j < x.size(); j++) {
if (j == i) { continue; }
output[i][j] = exp(-disSquare[i][j] * (*doubleSigmaSquareInverse)[i]);
piSum += output[i][j];
}
for (int j = 0; j < x.size(); j++) { output[i][j] /= piSum; }
}
- There are many results that will be used more than one time, such as the
L2
of input data. Therefore we can calculate once and restore them for future use, such as the code calculatingL2
of input data:
if (disSquare.empty()) {
disSquare.resize(x.size(), vector<double>(x.size()));
double normSquare = 0;
for (int i = 0; i < x.size(); i++) {
for (int j = i + 1; j < x.size(); j++) {
normSquare = 0;
assert(x[i].size() == x[j].size());
for (int k = 0; k < x[i].size(); k++) { normSquare += pow(x[i][k] - x[j][k], 2); }
disSquare[i][j] = disSquare[j][i] = normSquare;
}
}
}
- In random walk, we limit the times of each turn to avoid random walk on a small circle for a long time:
if (nearestLandMark[start] == UNKNOWN) {
nearestLandMark[start] = getNearestLandMark(selectedID[start]);
}
if (nearestLandMark[start] == UNREACHABLE) { return UNREACHABLE; }
int now = selectedID[start];
int randomWalkTimes = randomWalkTimesEveryTurn;
while (randomWalkTimes-- &&
(now == selectedID[start] || selectedIDMap.count(now) == 0)) {
now = randomWalkGraph[now][distribution[now](generator)].to;
}
if (selectedIDMap.count(now) == 0) {
return selectedIDMap[nearestLandMark[start]];
}
return selectedIDMap[now];
- Using multi-thread to improve is obvious.
Implementing t-SNE with random walk to train on bigger data sets.(this has been implemented!)
The result of one random experiment without random walk(training on 6,000
MNIST images for 1,000
iteration, this figure is a gif
file, which makes it possible to restart by saving it or opening it in a new tab) :
The result of one random experiment (training on 6,000
MNIST images for 1,000
iteration, but using all 60,000
MNIST images with random walk, this figure is a gif
file, which makes it possible to restart by saving it or opening it in a new tab):