index

基于特征点匹配的相似图片搜索工具

安装

CPU 版

pip install git+https://github.com/lolishinshi/index

GPU 版（只有训练阶段需要）

（请先安装 anaconda）

conda create -n index
conda activate index
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
pip install -r requirements.gpu.txt

用法

导入图片

# 使用 16 线程
index add -t 16 /mnt/pictures

由于 Python 多进程效率限制，推荐线程数为操作系统线程数的一半

图片导入完毕后会在数据库目录（默认为 index.db）下创建两个 sqlite 数据库：

metadata.db - 包含了图片的哈希和路径等信息
vector.db - 包含了图片的特征点信息，该数据库在索引构建完毕后可以删除

训练索引

预估添加 2M 张图片，并以此为基准训练索引。

index train -n 2000000 --gpu

训练完毕后会在数据库目录下生成 BIVF{K}_HNSW32.train 文件，K 为聚类时划分的桶个数，由 n 计算得来。以 2M 张图片为例，会生成 BIVF1048576_HNSW32.train。

训练时默认每个桶使用 50 个特征点训练，也就是 K/10 张图片。如果图片数量不足会影响训练效果，如果图片数量过多，则会延长训练时间。你可以通过 -x 100 来使用更多的图片训练。

构建索引

使用 BIVF1048576_HNSW32 作为模板，构建一个名为 image 的索引。完成会生成名为 BIVF1048576_HNSW32.index.image 的索引文件。

index build -d BIVF1048576_HNSW32 -n image

搜索

直接搜索本地图片

index search -n image test.jpg

提供 HTTP API

# 使用 mmap 减少内存占用
index server -n image --mmap

你可以在 /docs 路径下查看 API 文档

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/index		src/index
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
requirements.gpu.txt		requirements.gpu.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

index

安装

用法

导入图片

训练索引

构建索引

搜索

TODO

About

Releases

Packages

Languages

License

lolishinshi/index

Folders and files

Latest commit

History

Repository files navigation

index

安装

用法

导入图片

训练索引

构建索引

搜索

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages