iscc-search

Warning

This project is in early development and not ready for production use.

The API and features are subject to significant changes. Use at your own risk.

High-performance ISCC similarity search engine for variable-length binary ISCC codes with fast approximate nearest neighbor search.

Github repository: https://github.com/iscc/iscc-search/
Documentation https://search.iscc.codes/

Features

Fast approximate nearest neighbor search (ANNS) for variable-length binary vectors
Custom NPHD (Normalized Prefix Hamming Distance) metric optimized for ISCC codes
Support for 64-256 bit vectors (8-32 bytes)
Built on usearch with JIT-compiled Numba metrics
Cross-platform support (Linux, macOS, Windows)
Python 3.10-3.13 support

What is ISCC?

The International Standard Content Code (ISCC) is a similarity-preserving content identifier for digital media. ISCC codes are variable-length binary vectors that enable efficient similarity search across different media types. This library provides a specialized vector database for storing and querying ISCC codes at scale.

Installation

pip install iscc-search

For development installation:

git clone https://github.com/iscc/iscc-search.git
cd iscc-search
uv sync

Quick Start

from iscc_search import NphdIndex
import numpy as np

# Create index for up to 256-bit vectors
index = NphdIndex(max_dim=256)

# Add some binary vectors with integer keys
vectors = [
    np.array([18, 52, 86, 120], dtype=np.uint8),  # 32-bit vector
    np.array([171, 205, 239], dtype=np.uint8),  # 24-bit vector
    np.array([17, 34, 51, 68, 85], dtype=np.uint8),  # 40-bit vector
]
keys = [1, 2, 3]
index.add(keys, vectors)

# Search for similar vectors
query = np.array([18, 52, 86, 121], dtype=np.uint8)
matches = index.search(query, k=2)

print(f"Found {len(matches.keys)} matches")
print(f"Keys: {matches.keys}")
print(f"Distances: {matches.distances}")

API Overview

NphdIndex

The main index class for ANNS with variable-length binary vectors.

NphdIndex(max_dim=256, **kwargs)

max_dim: Maximum vector dimension in bits (default: 256)
**kwargs: Additional arguments passed to usearch Index

Methods

add(keys, vectors): Add vectors with integer keys
search(query, k): Search for k nearest neighbors
get(keys): Retrieve vectors by keys
remove(keys): Remove vectors by keys

Development

This project uses uv for package management and poethepoet for task automation.

Prerequisites

Python 3.10 or higher
uv package manager

Available Commands

uv run poe format-code      # Format Python code with ruff
uv run poe format-markdown  # Format markdown files
uv run poe format           # Format all files
uv run poe test             # Run tests with coverage (requires 100%)
uv run poe precommit        # Run pre-commit hooks
uv run poe all              # Format and test

Running Tests

# Run all tests with coverage
uv run poe test

# Run specific test
uv run pytest tests/test_nphd.py::test_pad_vectors

# Run tests in watch mode
uv run pytest --watch

Technical Details

NPHD Metric

The Normalized Prefix Hamming Distance (NPHD) is a valid metric specifically designed for variable-length prefix-compatible codes like ISCC. It normalizes the Hamming distance by the length of the common prefix, enabling meaningful similarity comparisons between vectors of different lengths.

Unlike standard Hamming distance, NPHD:

Correctly handles variable-length comparisons
Normalizes over common prefix length
Satisfies all metric axioms (non-negativity, identity, symmetry, triangle inequality)

Binary Vector Format

Vectors are stored as packed binary arrays (np.uint8) with an internal length prefix:

Each vector is prefixed with a length byte
Vectors are padded to uniform size for efficient indexing
pad_vectors() and unpad_vectors() handle conversions automatically

Custom usearch Build

This project uses custom usearch 2.21.0 wheels with platform-specific builds hosted at iscc.github.io to ensure consistent behavior across platforms.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please ensure:

All tests pass (uv run poe test)
Code is formatted (uv run poe format)
Coverage remains at 100%
Changes are documented

See CONTRIBUTING.md for details.

Repository initiated with fpgmaas/cookiecutter-uv.

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
.github		.github
docs		docs
iscc_search		iscc_search
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

iscc-search

Features

What is ISCC?

Installation

Quick Start

API Overview

NphdIndex

Methods

Development

Prerequisites

Available Commands

Running Tests

Technical Details

NPHD Metric

Binary Vector Format

Custom usearch Build

License

Contributing

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

iscc/iscc-search

Folders and files

Latest commit

History

Repository files navigation

iscc-search

Features

What is ISCC?

Installation

Quick Start

API Overview

NphdIndex

Methods

Development

Prerequisites

Available Commands

Running Tests

Technical Details

NPHD Metric

Binary Vector Format

Custom usearch Build

License

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages