Skip to content

model.transform() throwing error when using cuml for HDBSCAN with calculate_probabilities=True #1317

Closed
@slice-pranay

Description

@slice-pranay

Hi Maarten

Firstly, thank you for this amazing library. I'm generating topics on newsgroups data for testing and I am using cuML for UMAP and HDBSCAN. I have set the calculate_probabilites = True and performed fit_transform() on the data. It worked fine and gave good results. When I try to run transform() on new data it gives an error AttributeError: 'tuple' object has no attribute 'shape'. When i set calculate_probabilities = False this function works fine.

The libraries i am using are
bertopic==0.15.0
cuml-cu11==23.4.1
cudf-cu11==23.4.1
cuda toolkit 11.8

I am running on a virtual ubuntu machine with Tesla T4 GPU.

The code to reproduce this error

from bertopic import BERTopic
from cuml.cluster import HDBSCAN
from cuml.manifold import UMAP
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

train = docs[:15000]
test = docs[15000:]

umap_model = UMAP(n_components=5, n_neighbors=10, min_dist=0.0)
hdbscan_model = HDBSCAN(min_samples=25, min_cluster_size=50, gen_min_span_tree=True, prediction_data = True)

topic_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model, calculate_probabilities=True, verbose=True)
topics,probs = topic_model.fit_transform(train)

topics_test, probs_test = topic_model.transform(test)

The error that comes when i run this
Screenshot 2023-06-02 at 5 26 26 PM

Can you please guide me in solving this error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions