Skip to content

DaiSy is a library for scalable data series exact similarity search.

License

Notifications You must be signed in to change notification settings

MChatzakis/DaiSy

Repository files navigation

DaiSy Logo

DaiSy

A Library for Scalable Data Series Similarity Search

GitHub Stars

Francesca Del Gaudio, Manos Chatzakis, Gayathiri Ravendirane, Botao Peng, Themis Palpanas

Exact similarity search over large collections of data series is a fundamental operation in modern applications, yet existing solutions are often fragmented, specialized, or tailored to specific execution environments. We present DaiSy (Data series similarity Search library), a unified library for exact data series similarity search that integrates multiple state-of-the-art algorithms within a single, coherent framework. DaiSy is the first library to support exact similarity search across diverse execution environments, including implementations for disk-based, in-memory, GPU-accelerated, and distributed scalable similarity search. The library supports interfaces in both C++ and Python, enabling, researchers and practitioners to easily integrate its functionality in a variety of tasks.

ALPHA VERSION: Currently, DaiSy is experimental. The library is still under active development. We welcome suggestions and bug reports.

Supported State-of-the-Art algorithms

We currently support several algorithms for exact similarity search, each optimized for specific use cases and environments. The following table summarizes the key features of each algorithm:

Algorithm Description
Bruteforce Naive parallel similarity search implementation
Lower Bound Bruteforce Optimized bruteforce with lower bounding for the distance calculations
MESSI In-memory parallel similarity search
PARIS Disk-based parallel similarity search
SING GPU-accelerated in-memory parallel similarity search
Odyssey Distributed and parallel in-memory similarity search

Quickstart

Dependencies

  • Operating System: Linux, macOS, or Windows
  • C++ Compiler: C++14 or higher (GCC 6+, Clang 3.4+, MSVC 2015+)
  • CMake: Version 3.15 or higher

Optionally,

  • Python: 3.10-3.12
  • MPI: Required for Odyssey distributed computing algorithm
  • CUDA: Required for SING GPU acceleration algorithm

Installation

To download DaiSy, use:

git clone https://github.com/MChatzakis/daisy.git

cd daisy
git submodule update --init --recursive

Based on the available hardware, you can specify the below arguments to enable/disable features.

Flag Description Default Dependencies
BUILD_PYTHON Enable Python bindings OFF Python 3.10+
BUILD_BENCHMARK Build benchmarking tools OFF GoogleBenchmark
BUILD_TESTS Build test suite OFF GoogleTest
BUILD_DEMO Build demonstration applications ON Core library
BUILD_ODYSSEY Enable MPI for distributed computing OFF OpenMPI/MPICH
BUILD_SING Enable CUDA for GPU acceleration OFF CUDA Toolkit
DEBUG_MSG Enable debug output OFF None

To compile:

mkdir build && cd build

cmake ..
make

DaiSy with Python

If you intent to use only the Python interface, you can install the library directly from PyPI using pip:

pip install daisy-exact-search

If you want to use Odyssey, you will need to install mpi:

pip install daisy-exact-search[mpi]

Compatibility issues

Kindly note that we are aware for compatibility issues related to ARM processors (e.g., Apple MX processors). Due to pthread-barriers and SIMD being unavailable on ARM, we currently noticing compilations failling on ARM machines. We are currently working on possible solutions, however we recommend using DaiSy on non-ARM machines for the time being.

Others

We provide several usage examples in both C++ and Python under demos/, demonstrating how to utilize the library for various similarity search tasks. We provide several troubleshooting guides and extra resources in the docs/ directory. In this directory, we also provide useful information about how to contribute to the project, and how to implement new algorithms.

About

Work supported by $Y \Pi AI \Theta A$ & NextGenerationEU project HARSH ($Y\Pi 3TA-0560901$).

The logo of DaiSy was designed by Eva Chamilaki.