Skip to content

u1234x1234/pynanoflann

Repository files navigation

Build Status codecov

pynanoflann

Unofficial python wrapper to the nanoflann library [1] with sklearn interface and additional multithreaded capabilities.

nanoflann implementation of k-d tree provides one of the best performance for many k-nearest neighbour problems [2].

It is a good choice for exact k-NN, radius searches in low dimensional spaces.

Install

pip install git+https://github.com/u1234x1234/pynanoflann.git@0.0.8

Basic Usage

import numpy as np
import pynanoflann

index = np.random.uniform(0, 100, (100, 4))
queries = np.random.uniform(0, 100, (10, 4))

nn = pynanoflann.KDTree(n_neighbors=5, metric='L1', radius=100)
nn.fit(index)

# Get k-nearest neighbors
distances, indices = nn.kneighbors(queries)

# Radius search
distances, indices = nn.radius_neighbors(queries)

Save, load

If you need to save the model, there are two options:

  1. Save only the built index. In this case, data points are NOT stored in file. Very efficient, but inconvenient in some cases.
kdtree.fit(X)
kdtree.save_index('index.bin')

prebuilt_kdtree = pynanoflann.KDTree()
# Must use the same data on which the index was built.
prebuilt_kdtree.fit(X, 'index.bin')

Please refer to the detailed example

  1. Pickle the whole model with data points. Less efficient, but convenient.
kdtree.fit(X)
with open('kdtree.pkl', 'wb') as out_file:
    pickle.dump(kdtree, out_file)
with open('kdtree.pkl', 'rb') as in_file:
    unpickled_kdtree = pickle.load(in_file)

Please refer to the detailed example

Multicore parallelization

Implemented on C++ side to reduce python's multiprocessing overhead.

Performance

Generally it much faster than brute force or cython implementation of k-d tree in sklearn

To run benchmark:

python benchmark.py

Links

  1. Blanco, Jose Luis and Rai, Pranjal Kumar, 2014, nanoflann: a C++ header-only fork of FLANN, a library for Nearest Neighbor ({NN}) with KD-trees.
  2. Vermeulen, J.L., Hillebrand, A. and Geraerts, R., 2017. A comparative study of k‐nearest neighbour techniques in crowd simulation.
  3. OpenFLANN Performance comparison between PCL, nanoflann, picoflann