Open
Description
Currently if I want to run UMAP with batched nn descent, I can call fit_transform() and this works. However, if I want to call fit and transform independently (e.g. to fit my data on just a subset of the overall dataset) only fit currently supports batched nn descent, while transform falls back to using brute force knn.
import numpy as np
from cuml.manifold import UMAP
N = 10000
K = 32
rng = np.random.default_rng()
data = rng.random((N, K), dtype="float32")
reducer = UMAP(
n_components=2,
n_neighbors=15,
build_algo="nn_descent",
build_kwds={"nnd_n_clusters": 4},
)
fitted_umap = reducer.fit(data)
embeddings = fitted_umap.transform(data)
[I] [13:36:50.855150] Transform can only be run with brute force. Using brute force.
How much effort would be required for nn descent to support transform as well?
Metadata
Metadata
Assignees
Type
Projects
Status
No status