Closed
Description
Hi Maarten
Firstly, thank you for this amazing library. I'm generating topics on newsgroups data for testing and I am using cuML for UMAP and HDBSCAN. I have set the calculate_probabilites = True
and performed fit_transform() on the data. It worked fine and gave good results. When I try to run transform() on new data it gives an error AttributeError: 'tuple' object has no attribute 'shape'
. When i set calculate_probabilities = False
this function works fine.
The libraries i am using are
bertopic==0.15.0
cuml-cu11==23.4.1
cudf-cu11==23.4.1
cuda toolkit 11.8
I am running on a virtual ubuntu machine with Tesla T4 GPU.
The code to reproduce this error
from bertopic import BERTopic
from cuml.cluster import HDBSCAN
from cuml.manifold import UMAP
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
train = docs[:15000]
test = docs[15000:]
umap_model = UMAP(n_components=5, n_neighbors=10, min_dist=0.0)
hdbscan_model = HDBSCAN(min_samples=25, min_cluster_size=50, gen_min_span_tree=True, prediction_data = True)
topic_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model, calculate_probabilities=True, verbose=True)
topics,probs = topic_model.fit_transform(train)
topics_test, probs_test = topic_model.transform(test)
The error that comes when i run this
Can you please guide me in solving this error.
Metadata
Metadata
Assignees
Labels
No labels