-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Special operations on indexes
There are a few additional operations that are not supported for all types of indexes.
These are operations that rely on id-based access on the dataset.
The vector ids for an IndexIVF
(and IndexBinaryIVF
) are stored in the inverted lists.
Therefore there is no way to map back from an id to the entry in the index.
To support removal or updates on IndexIVF
, the DirectMap
field of the IndexIVF
object stores a mapping from id to the location where it is stored in the index.
It can have 3 values:
-
DirectMap.NoMap
: no mapping is stored, reconstruction is not possible (default). -
DirectMap.Array
: the direct map is an array. The indices are assumed to be sequential, which rules outadd_with_ids
-
DirectMap.Hashtable
: the direct map is a hashtable. Indices can be arbitrary andadd_with_ids
works (provided indices are distinct).
Set or change the DirectMap type with index.set_direct_map_type(DirectMap.Array)
.
Example usage here: TestReconsHash.
One vector can be reconstructed with the reconstruct
method.
To call it on several vectors (mainly useful in python to avoid the loop), use reconstruct_batch
.
This function reconstructs all vectors from a range of contiguous ids. ids that are not present in the dataset are untouched on output.
Example usage: test_index_composite.py
This is supported for all types of indexes. Make sure to enable the direct map for the IndexIVF
index types.
The method remove_ids
removes a subset of vectors from an index. It takes an IDSelector
object that is called for every element in the index to decide whether it should be removed. IDSelectorBatch
will do this for a list of indices. The Python interface constructs this from numpy arrays if necessary.
NB that since it does a pass over the whole database, this is efficient only when a significant number of vectors needs to be removed (see exception below).
Example: test_index_composite.py
Supported by IndexFlat
, IndexIVFFlat
, IDMap
.
Note that there is a semantic difference when removing ids from sequential indexes vs. when removing them from an IndexIVF
:
-
for sequential indexes (
IndexFlat
,IndexPQ
,IndexLSH
), the removal operation shifts the ids of vectors above the removed vector id. -
the
IndexIVF
andIndexIDMap2
store the ids of vectors explicitly, so the ids of other vectors are not changed. There are two special cases forIndexIVF
:-
DirectMap
typeArray
does not support removal because it means that all the indices would be shifted, which does not seem very useful. - with a direct map type
Hashtable
and a selectorIDSelectorArray
elements can be removed without scanning the whole index.
-
The method range_search
returns all vectors within a radius around the query point (as opposed to the k nearest ones). Since the result lists for each query are of different sizes, it must be handled specially:
-
in C++ it returns the results in a pre-allocated
RangeSearchResult
structure -
in Python, the results are returned as a triplet of 1D arrays
lims, D, I
, where result for query i is inI[lims[i]:lims[i+1]]
(indices of neighbors),D[lims[i]:lims[i+1]]
(distances).
Supported by (CPU only): IndexFlat
, IndexIVFFlat
, IndexScalarQuantizer
, IndexIVFScalarQuantizer
.
The methods:
-
merge_from
copies another index to this and deallocates it on-the-fly. You can useivflib::merge_into
forIndexIVF
s wrapped in a pre-transform. Please make sure you understand how vector ids are managed (sequential or user-assigned), see This issue for a discussion about the difference between the two. -
copy_subset_to
copies a subset of this codes to another index. Example usage: to build indexes on a GPU and move them to CPU afterwards
The functions are implemented only for IndexIVF
subclasses because they are mainly interesting for large indexes.
Faiss building blocks: clustering, PCA, quantization
Index IO, cloning and hyper parameter tuning
Threads and asynchronous calls
Inverted list objects and scanners
Indexes that do not fit in RAM
Brute force search without an index
Fast accumulation of PQ and AQ codes (FastScan)
Setting search parameters for one query
Binary hashing index benchmark