toy projects from CVPR 2020 Tutorial on Image Retrieval in the Wild
this is a summary of CVPR 2020 Tutorial on Image Retrieval in the Wild, I have found Matsui Sensei's talk on Billion scale Approximate Nearest Neighbor Search extremely informative, therefore I tried to summarize the key points in here for my own reference.
Research Question: Given a query 𝒒, find the closest vector from the database(One of the fundamental problems in computer science)
-
Nearest Neighbor Search(NN) 😟
linear scan, 𝑂(𝑁𝐷), slow
-
Approximate Nearest Neighbor Search(ANN) 😊
Faster search Don’t necessarily have to be exact neighbors Trade off: runtime, accuracy, and memory consumption
Naïve implementation
Fast implementation (by Faiss) : Matrix multiplication by BLAS
ps : NN in GPU (faiss gpu ) is 10x faster than NN in CPU faiss cpu
-
Math friendly 😊
-
Popular in the theory area (FOCS, STOC, …) 😊
-
Large memory cost 😟
-
Need several tables to boost the accuracy 😟
-
Need to store the original data on memory 😟
-
Data dependent methods such as PQ are better for real world data 😟
-
Thus, in recent CV papers, LSH has been treated as a classic method 😟 😟
Implementation by Flconn
- Very popular in recent years
- Around 2017, it turned out that the graph traversal based methods work well for million scale data
- Pioneer:
- Navigable Small World Graphs (NSW)
- Hierarchical NSW (HNSW)
- Implementation: nmslib , hnsw , faiss
TBC...
@misc{cvpr20_tutorial_image_retrieval, author = {Yusuke Matsui and Takuma Yamaguchi and Zheng Wang}, title = {CVPR2020 Tutorial on Image Retrieval in the Wild}, howpublished = {\url{https://matsui528.github.io/cvpr2020_tutorial_retrieval/}}, year = {2020} }