Skip to content

[FEA] Create google tests (and python tests) for "large data" #1273

Open
@cjnolet

Description

@cjnolet

We have recently had some regressions that are causing issues when using 32-bit indexing types where n_rows * n_cols > 2^32-1. Kmeans is an example- this used to scale to very large numbers but after some recent refactors (and these cases not being explicitly verified through testing) we're getting illegal memory access errors and OOMs from overflows.

For now, I'm working around the issue by promoting the types to the int64_t indexed emplates here but a longer term fix is going to involve some changes to the innards of the algorithm in RAFT. I'd like to start capturing these with a separate set of googletest cases that don't necessarily run all the time, but that we can at least run locally and run nightly to verify our algorithms still scale to these sizes. It might even be worth creating a separate set of binaries like CLUSTER_SCALE_TESTS.

Also, as usual, I'm creating this issue hoping to start a larger discussion and solicit feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions