Skip to content

[FEA] Changing COO Index_Type in UMAP to prevent overflow when running with large datasets #6010

Open
@jinsolp

Description

@jinsolp

Description

UMAP cannot run large datasets right now because of an overflow issue.
raft::sparse::COO defaults to using int for its Index_Type and this becomes a problem.

When this issue is solved, we need to update UMAPAlgo::FuzzySimplSet::ML::run() to take COO with an Index_Type other than int.

Details

Specifically, coo_symmetrize (raft function called from UMAPAlgo::FuzzySimplSet::ML::run()) allocates nnz * 2 space on device. For a large dataset (e.g. 88M samples with knn graph degree 16) this value is larger than max int (88M * 16 * 2 > INT_MAX).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions