Warning
This project is in early development and not ready for production use.
The API and features are subject to significant changes. Use at your own risk.
High-performance ISCC similarity search engine for variable-length binary ISCC codes with fast approximate nearest neighbor search.
- Github repository: https://github.com/iscc/iscc-search/
- Documentation https://search.iscc.codes/
- Fast approximate nearest neighbor search (ANNS) for variable-length binary vectors
- Custom NPHD (Normalized Prefix Hamming Distance) metric optimized for ISCC codes
- Support for 64-256 bit vectors (8-32 bytes)
- Built on usearch with JIT-compiled Numba metrics
- Cross-platform support (Linux, macOS, Windows)
- Python 3.10-3.13 support
The International Standard Content Code (ISCC) is a similarity-preserving content identifier for digital media. ISCC codes are variable-length binary vectors that enable efficient similarity search across different media types. This library provides a specialized vector database for storing and querying ISCC codes at scale.
pip install iscc-searchFor development installation:
git clone https://github.com/iscc/iscc-search.git
cd iscc-search
uv syncfrom iscc_search import NphdIndex
import numpy as np
# Create index for up to 256-bit vectors
index = NphdIndex(max_dim=256)
# Add some binary vectors with integer keys
vectors = [
np.array([18, 52, 86, 120], dtype=np.uint8), # 32-bit vector
np.array([171, 205, 239], dtype=np.uint8), # 24-bit vector
np.array([17, 34, 51, 68, 85], dtype=np.uint8), # 40-bit vector
]
keys = [1, 2, 3]
index.add(keys, vectors)
# Search for similar vectors
query = np.array([18, 52, 86, 121], dtype=np.uint8)
matches = index.search(query, k=2)
print(f"Found {len(matches.keys)} matches")
print(f"Keys: {matches.keys}")
print(f"Distances: {matches.distances}")The main index class for ANNS with variable-length binary vectors.
NphdIndex(max_dim=256, **kwargs)max_dim: Maximum vector dimension in bits (default: 256)**kwargs: Additional arguments passed to usearch Index
add(keys, vectors): Add vectors with integer keyssearch(query, k): Search for k nearest neighborsget(keys): Retrieve vectors by keysremove(keys): Remove vectors by keys
This project uses uv for package management and poethepoet for task automation.
- Python 3.10 or higher
- uv package manager
uv run poe format-code # Format Python code with ruff
uv run poe format-markdown # Format markdown files
uv run poe format # Format all files
uv run poe test # Run tests with coverage (requires 100%)
uv run poe precommit # Run pre-commit hooks
uv run poe all # Format and test# Run all tests with coverage
uv run poe test
# Run specific test
uv run pytest tests/test_nphd.py::test_pad_vectors
# Run tests in watch mode
uv run pytest --watchThe Normalized Prefix Hamming Distance (NPHD) is a valid metric specifically designed for variable-length prefix-compatible codes like ISCC. It normalizes the Hamming distance by the length of the common prefix, enabling meaningful similarity comparisons between vectors of different lengths.
Unlike standard Hamming distance, NPHD:
- Correctly handles variable-length comparisons
- Normalizes over common prefix length
- Satisfies all metric axioms (non-negativity, identity, symmetry, triangle inequality)
Vectors are stored as packed binary arrays (np.uint8) with an internal length prefix:
- Each vector is prefixed with a length byte
- Vectors are padded to uniform size for efficient indexing
pad_vectors()andunpad_vectors()handle conversions automatically
This project uses custom usearch 2.21.0 wheels with platform-specific builds hosted at iscc.github.io to ensure consistent behavior across platforms.
MIT License - see LICENSE file for details.
Contributions are welcome! Please ensure:
- All tests pass (
uv run poe test) - Code is formatted (
uv run poe format) - Coverage remains at 100%
- Changes are documented
See CONTRIBUTING.md for details.
Repository initiated with fpgmaas/cookiecutter-uv.