Skip to content

[FEA] Implement UMAP transform with batched nn descent #6215

@btepera

Description

@btepera

Currently if I want to run UMAP with batched nn descent, I can call fit_transform() and this works. However, if I want to call fit and transform independently (e.g. to fit my data on just a subset of the overall dataset) only fit currently supports batched nn descent, while transform falls back to using brute force knn.

import numpy as np
from cuml.manifold import UMAP

N = 10000
K = 32

rng = np.random.default_rng()
data = rng.random((N, K), dtype="float32")

reducer = UMAP(
    n_components=2,
    n_neighbors=15,
    build_algo="nn_descent",
    build_kwds={"nnd_n_clusters": 4},
)

fitted_umap = reducer.fit(data)
embeddings = fitted_umap.transform(data)

[I] [13:36:50.855150] Transform can only be run with brute force. Using brute force.

How much effort would be required for nn descent to support transform as well?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    • Status

      No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions