Skip to content

[BUG] matrix::select_k: inconsistent handling of infinities #1822

Open
@achirkin

Description

@achirkin

matrix::select_k uses a heuristic to select one among several implementations depending on the input arguments.
The problem is that some of the implementation have a special handling of max_bound/min_bound (inifinity) values; they may replace the corresponding indices (payload) with special values, such as zero or numeric_limit<T>::max().
This problem manifests itself when the number of valid (non-infinity) values is less than k. Example: IVF-PQ may return many infinities if the low-precision internal types are used; if the refinement used afterwards, it can return the zeroth vector from the database multiple times.
This can be tested in the newly added tests:

// Some algorithms return invalid indices in special cases.
// This can be considered as TODO for us to fix.
if (static_cast<size_t>(ix_i) >= in_ids.size()) return forgive_i;
if (static_cast<size_t>(ix_j) >= in_ids.size()) return forgive_j;

This listing summarizes what indices (payloads) do different implementation return:
auto forgive_algo(const std::optional<select::Algo>& algo, IdxT ix) const -> bool
{
if (!algo.has_value()) { return false; }
switch (algo.value()) {
// not sure which algo this is.
case select::Algo::kPublicApi: return true;
// warp-sort-based algos currently return zero index for inf distances.
case select::Algo::kWarpAuto:
case select::Algo::kWarpImmediate:
case select::Algo::kWarpFiltered:
case select::Algo::kWarpDistributed:
case select::Algo::kWarpDistributedShm: return ix == 0;
// FAISS version returns a special invalid value:
case select::Algo::kFaissBlockSelect: return ix == std::numeric_limits<IdxT>::max();
// Do not forgive by default
default: return false;
}

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions