Open
Description
Description
UMAP cannot run large datasets right now because of an overflow issue.
raft::sparse::COO
defaults to using int
for its Index_Type
and this becomes a problem.
When this issue is solved, we need to update UMAPAlgo::FuzzySimplSet::ML::run()
to take COO
with an Index_Type
other than int
.
Details
Specifically, coo_symmetrize
(raft function called from UMAPAlgo::FuzzySimplSet::ML::run()
) allocates nnz * 2
space on device. For a large dataset (e.g. 88M samples with knn graph degree 16) this value is larger than max int (88M * 16 * 2 > INT_MAX).