Skip to content

Commit 057691c

Browse files
authored
Add BERTopic docs (huggingface#1193)
* Add BERTopic docs * Incorporate review comments * Incorporate overlooked review comments
1 parent 26ce9fc commit 057691c

File tree

2 files changed

+96
-0
lines changed

2 files changed

+96
-0
lines changed

docs/hub/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@
6161
title: Adapter Transformers
6262
- local: allennlp
6363
title: AllenNLP
64+
- local: bertopic
65+
title: BERTopic
6466
- local: asteroid
6567
title: Asteroid
6668
- local: diffusers

docs/hub/bertopic.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Using BERTopic at Hugging Face
2+
3+
[BERTopic](https://github.com/MaartenGr/BERTopic) is a topic modeling framework that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
4+
5+
BERTopic supports all kinds of topic modeling techniques:
6+
<table>
7+
<tr>
8+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/guided/guided.html">Guided</a></td>
9+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/supervised/supervised.html">Supervised</a></td>
10+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/semisupervised/semisupervised.html">Semi-supervised</a></td>
11+
</tr>
12+
<tr>
13+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/manual/manual.html">Manual</a></td>
14+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/distribution/distribution.html">Multi-topic distributions</a></td>
15+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/hierarchicaltopics/hierarchicaltopics.html">Hierarchical</a></td>
16+
</tr>
17+
<tr>
18+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/topicsperclass/topicsperclass.html">Class-based</a></td>
19+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/topicsovertime/topicsovertime.html">Dynamic</a></td>
20+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/online/online.html">Online/Incremental</a></td>
21+
</tr>
22+
<tr>
23+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/multimodal/multimodal.html">Multimodal</a></td>
24+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/multiaspect/multiaspect.html">Multi-aspect</a></td>
25+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/representation/llm.html">Text Generation/LLM</a></td>
26+
</tr>
27+
<tr>
28+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html">Zero-shot <b>(new!)</b></a></td>
29+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/merge/merge.html">Merge Models <b>(new!)</b></a></td>
30+
<td><a href="https://maartengr.github.io/BERTopic/getting_started/seed_words/seed_words.html">Seed Words <b>(new!)</b></a></td>
31+
</tr>
32+
</table>
33+
34+
## Exploring BERTopic on the Hub
35+
36+
You can find BERTopic models by filtering at the left of the [models page](https://huggingface.co/models?library=bertopic&sort=trending).
37+
38+
BERTopic models hosted on the Hub have a model card with useful information about the models. Thanks to BERTopic Hugging Face Hub integration, you can load BERTopic models with a few lines of code. You can also deploy these models using [Inference Endpoints](https://huggingface.co/inference-endpoints).
39+
40+
## Installation
41+
42+
To get started, you can follow the [BERTopic installation guide](https://github.com/MaartenGr/BERTopic#installation).
43+
You can also use the following one-line install through pip:
44+
45+
```bash
46+
pip install bertopic
47+
```
48+
49+
## Using Existing Models
50+
51+
All BERTopic models can easily be loaded from the Hub:
52+
53+
```py
54+
from bertopic import BERTopic
55+
topic_model = BERTopic.load("MaartenGr/BERTopic_Wikipedia")
56+
```
57+
58+
Once loaded, you can use BERTopic's features to predict the topics for new instances:
59+
60+
```py
61+
topic, prob = topic_model.transform("This is an incredible movie!")
62+
topic_model.topic_labels_[topic]
63+
```
64+
65+
Which gives us the following topic:
66+
67+
```text
68+
64_rating_rated_cinematography_film
69+
```
70+
71+
## Sharing Models
72+
73+
When you have created a BERTopic model, you can easily share it with others through the Hugging Face Hub. To do so, we can make use of the `push_to_hf_hub` function that allows us to directly push the model to the Hugging Face Hub:
74+
75+
```python
76+
from bertopic import BERTopic
77+
78+
# Train model
79+
topic_model = BERTopic().fit(my_docs)
80+
81+
# Push to HuggingFace Hub
82+
topic_model.push_to_hf_hub(
83+
repo_id="MaartenGr/BERTopic_ArXiv",
84+
save_ctfidf=True
85+
)
86+
```
87+
88+
Note that the saved model does not include the dimensionality reduction and clustering algorithms. Those are removed since they are only necessary to train the model and find relevant topics. Inference is done through a straightforward cosine similarity between the topic and document embeddings. This not only speeds up the model but allows us to have a tiny BERTopic model that we can work with.
89+
90+
## Additional Resources
91+
92+
* [BERTopic repository](https://github.com/MaartenGr/BERTopic)
93+
* [BERTopic docs](https://maartengr.github.io/BERTopic/)
94+
* [BERTopic models in the Hub](https://huggingface.co/models?library=bertopic&sort=trending)

0 commit comments

Comments
 (0)