TeamDoubleFiltering

This repository contains the code and Docker environment for participating in the SISAP 2025 Indexing Challenge (Task 1).

Overview

We propose a double filtering method for efficient nearest neighbor search on large-scale vector datasets.
Our method uses two stages of quantized projections (short and long) to reduce the number of candidate vectors while maintaining high accuracy.

Repository Structure

.
├── Dockerfile
├── autoexec_all.sh               # prepare system and run search
├── autoexec_convert_dataset.sh   # convert vectors of pubmed23 dataset to char vactors
├── autoexec_convert_otest.sh     # convert query vectors to char vectors
├── autoexec_eval.sh              # run search for evaluation using 16 hyper-parameters
├── autoexec_eval_4g.sh           # run search for RAM = 4G environment
├── autoexec_gt_otest_answer.sh   # convert ground truth of otest to answer format (csv)
├── autoexec_make_bucket.sh       # make bucket file for sketch
├── autoexec_prepare.sh           # prepare system (vector conversion, bucket construction, ... )
├── search.sh                     # search using hyper parameters
├── search_by_double_filtering.sh # search using double filtering
├── batch/                        # hyperparameters (nc1 and nc2) for search
├── pivot/                        # pivot files (.csv) for sketch and QSMAP
├── src/                          # Source code for indexing and searching

How to Build the Docker Image

Run the following command in the root directory of the repository:

docker build -t doublefiltering .

How to Run the Search

Run the container with the dataset and output directory mounted as volumes:

docker run --rm --memory=16g --memory-swap=16g -v <path to benchmark-dev-pubmed23.h5>:/app/data/benchmark-dev-pubmed23.h5:ro   -v <path to directory for result files>:/app/result doublefiltering /bin/bash /app/autoexec_all.sh

Replace the paths with your actual dataset path and result output directory.

How to Run the Search with RAM = 4GB

Run the container in iterative mode with the dataset and output directory mounted as volumes:

docker run --rm --memory=4g --memory-swap=4g -it -v <path to benchmark-dev-pubmed23.h5>:/app/data/benchmark-dev-pubmed23.h5:ro   -v <path to directory for result files>:/app/result doublefiltering /bin/bash

After entering container, first run autoexec_prepare.sh and then run autoexec_eval_4g.sh. The results will be stored in directory "temp".

Note: The script autoexec_eval_4g.sh is designed for experiments under strict memory constraints (e.g., 4GB RAM), as discussed in the accompanying short paper.
Be sure to run it inside a Docker container launched with the --memory=4g and --memory-swap=4g options, as shown above, to ensure the memory limit is properly enforced.
This evaluation illustrates the effectiveness of our method even under low-resource environments.

Output Files

The search results will be saved in the /app/result/ directory, including:

result.csv, result1.csv, ..., each line formatted as:
```
query-ID, distNN, answer-ID[1], dist[1], answer-ID[2], dist[2], ...
```
where query-ID and answer-ID are 0-based indices.

summary_all.csv: a summary of the overall search run, with rows like:

trial, file, width, q_bit, ftr_on, nc1, nc2, recall@1, recall@30,
filtering, 1st, 2nd, k_NN, ave(ms/q), stdev(ms/q), min(ms/q), max(ms/q)

Explanation of Metrics

nc1, nc2: number of candidates obtained by the 1st and 2nd filtering stages
filtering, 1st, 2nd: filtering costs in seconds
k_NN: cost for final reranking among filtered candidates
ave, stdev, min, max: per-query search time statistics (milliseconds)

Requirements

Docker (tested with Docker 24.0+)
Host machine with at least 16GB RAM

License

This repository is licensed under the MIT License.
See LICENSE for details.

Contact

If you have any questions, please contact us at:

GitHub: shino-kyutech
Email: [shino.kyutech@gmail.com]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TeamDoubleFiltering

Overview

Repository Structure

How to Build the Docker Image

How to Run the Search

How to Run the Search with RAM = 4GB

Output Files

Explanation of Metrics

Requirements

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
.vscode		.vscode
batch		batch
pivot		pivot
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
autoexec_all.sh		autoexec_all.sh
autoexec_convert_dataset.sh		autoexec_convert_dataset.sh
autoexec_convert_otest.sh		autoexec_convert_otest.sh
autoexec_eval.sh		autoexec_eval.sh
autoexec_eval_4g.sh		autoexec_eval_4g.sh
autoexec_gt_otest_to_answer.sh		autoexec_gt_otest_to_answer.sh
autoexec_make_bucket.sh		autoexec_make_bucket.sh
autoexec_prepare.sh		autoexec_prepare.sh
search.sh		search.sh
search_by_double_filtering.sh		search_by_double_filtering.sh
sisap25-performance.jl		sisap25-performance.jl
sisap25-task1.sh		sisap25-task1.sh

License

sisap-challenges/sisap25-TeamDoubleFiltering

Folders and files

Latest commit

History

Repository files navigation

TeamDoubleFiltering

Overview

Repository Structure

How to Build the Docker Image

How to Run the Search

How to Run the Search with RAM = 4GB

Output Files

Explanation of Metrics

Requirements

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages