PyTorch implementation of a version of the Autoencoding Variational Inference For Topic Models (AVITM) algorithm. Compatible with PyTorch 1.0.0 and Python 3.6 or 3.7 with or without CUDA.
This follows (or attempts to; note this implementation is unofficial) the algorithm described in "Autoencoding Variational Inference For Topic Models" of Akash Srivastava, Charles Sutton (https://arxiv.org/abs/1703.01488).
You can find a number of examples in the examples directory, see also Usage below.
The simplest way to use the library is using the sklearn-compatible API, as below.
import sklearn.datasets
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline
from ptavitm.sklearn_api import ProdLDATransformer
texts = sklearn.datasets.fetch_20newsgroups()['data']
pipeline = make_pipeline(
CountVectorizer(
stop_words='english',
max_features=2500,
max_df=0.9
),
ProdLDATransformer()
)
pipeline.fit(texts)
result = pipeline.transform(texts)
- Original TensorFlow: https://github.com/akashgit/autoencoding_vi_for_topic_models
- PyTorch: https://github.com/hyqneuron/pytorch-avitm