You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
numpy, pandas, matplotlib, scikitlearn are already included in Anaconda. <br>
16
16
17
17
<h2>How to run the code</h2>
18
-
Clone this repository and open the notebook in jupyter notebook.<br>
19
-
Now one can run each and every cell of the notebook. Furthur details of what each section of the code contains is given below.
20
-
The first section of the code contains preliminary work which is needed. We <b>imported the libraries</b> that are required, then <b>downloading stopwords</b> and the function for creating the <b>tf-idf vector</b>.<br>
21
-
The next section will <b>use the imported Reuters dataset</b> and divide it into training and testing data, form the tf-idf vector from the training data. <br>
22
-
Now the section of <b>Visualization</b> has importing gensim library, tokenising the single document text, converting the tokenised vector to pandas dataframe and then visualising the word embeddings.
23
-
Then we move to the <b>Particle Swarm Optimization</b> section where we have a function for PSO algorithm.<br>
24
-
The next section is for <b>Spectral Clustering</b> which will import necessary libraries, fit the data and calculate the Adjusted Random Index (ARI).<br>
25
-
The next section is <b>our own ideas</b> which involves the idea of using <b>Principle Component Analysis(PCA) on Affinity matrix with Euclidean Distance</b>. Here we applied PCA on Affinity matrix with Euclidean Distance and then calculated the ARI for the model.<br>
26
-
The next section has our other idea which is to use <b>Principle Component Analysis(PCA) on Affinity matrix with Gaussian Kernel</b>. Here we applied PCA on Affinity matrix with Gaussian Kernel and then calculated the ARI for the model.<br>
27
-
The last section is the <b>Comparison of Adjusted Rand Index</b> for various models.
18
+
1)Clone this repository and open the notebook in jupyter notebook.<br>
19
+
2)Now one can run each and every cell of the notebook. Furthur details of what each section of the code contains is given below in furthur steps.<br>
20
+
3)The first section of the code contains preliminary work which is needed. We <b>imported the libraries</b> that are required, then <b>downloading stopwords</b> and the function for creating the <b>tf-idf vector</b>.<br>
21
+
4)The next section will <b>use the imported Reuters dataset</b> and divide it into training and testing data, form the tf-idf vector from the training data. <br>
22
+
5)Now the section of <b>Visualization</b> has importing gensim library, tokenising the single document text, converting the tokenised vector to pandas dataframe and then visualising the word embeddings.<br>
23
+
6)Then we move to the <b>Particle Swarm Optimization</b> section where we have a function for PSO algorithm.<br>
24
+
7)The next section is for <b>Spectral Clustering</b> which will import necessary libraries, fit the data and calculate the Adjusted Random Index (ARI).<br>
25
+
8)The next section is <b>our own ideas</b> which involves the idea of using <b>Principle Component Analysis(PCA) on Affinity matrix with Euclidean Distance</b>. Here we applied 9) PCA on Affinity matrix with Euclidean Distance and then calculated the ARI for the model.<br>
26
+
10)The next section has our other idea which is to use <b>Principle Component Analysis(PCA) on Affinity matrix with Gaussian Kernel</b>. Here we applied PCA on Affinity matrix with Gaussian Kernel and then calculated the ARI for the model.<br>
27
+
11)The last section is the <b>Comparison of Adjusted Rand Index</b> for various models.
0 commit comments