Description
We have recently had some regressions that are causing issues when using 32-bit indexing types where n_rows
* n_cols
> 2^32-1. Kmeans is an example- this used to scale to very large numbers but after some recent refactors (and these cases not being explicitly verified through testing) we're getting illegal memory access
errors and OOMs from overflows.
For now, I'm working around the issue by promoting the types to the int64_t indexed emplates here but a longer term fix is going to involve some changes to the innards of the algorithm in RAFT. I'd like to start capturing these with a separate set of googletest cases that don't necessarily run all the time, but that we can at least run locally and run nightly to verify our algorithms still scale to these sizes. It might even be worth creating a separate set of binaries like CLUSTER_SCALE_TESTS
.
Also, as usual, I'm creating this issue hoping to start a larger discussion and solicit feedback.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status