Skip to content

Commit a7d3266

Browse files
committed
tweak page and attempt image fix
1 parent 91a0d37 commit a7d3266

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

_projects/1_nlp.markdown

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ importance: 1
77
---
88

99

10-
Below are interactive plots visualizing topic modeling on a collection of article abstracts pulled from [Microsoft Academic](https://academic.microsoft.com/home) related to energy storage. The abstracts were obtained with the search term "Energy Storage", keeping the top 10000 results. Duplicate papers were removed (identified by DOI) and only articles in english were retained, resulting in 6959 abstracts.
10+
Below are interactive plots visualizing topic modeling on a collection of article abstracts pulled from [Microsoft Academic](https://academic.microsoft.com/home) related to energy storage. The abstracts were obtained with the search term "Energy Storage", keeping the top 100000 results. Duplicate papers were removed (identified by DOI) and only articles in english were retained, resulting in approximately 40000 abstracts.
1111

1212

1313

@@ -34,13 +34,19 @@ The features of the plot indicate the following:
3434
<embed type="text/html" src="es_network.html" style="width:100%" height=950>
3535
</div>
3636

37+
38+
3739
# Topic Visualization with t-SNE
3840

3941
The plot below goes further and visualizes the topic distributions of each individual paper. To be able to visualize the topics, the number of topics is reduced
4042

4143
Below is a visualization of the topic modeling of the corpus. First, the texts are represented as points on a 2D surface using t-Distributed Stochastic Neighbor Embedding (t-SNE). The topic distribution for each paper is visualized by representing each paper as a pie chart. Each slice represents a topic, and the fractional size (angle) of each slice represents the probability of that topic. Only the top 3 topics for each paper are inclused (resulting in an incomplete pie chart) for the sake of graphics processing.
4244

43-
![](wedge_example.PNG)
45+
46+
47+
![](wedge_example.png)
48+
49+
4450

4551
The top words for each topic are indicated in the legend (see next visualization to explore the topic words in more detail). The topics in the legend are sorted by the number of papers that have that topic as their most probable topic.
4652

0 commit comments

Comments
 (0)