A rich research framework for hyperdimensional computing on large boolean vectors supporting program transformation and multiple backends for computation (plain Python, C++, NumPy, PyTorch). Many metrics and utility functions aim to aid the intuitive understanding of this new paradigm, and there are multiple levels of functionality available from the data marshalling and the basic (XOR, MAJ, PERMUTE)-algebra to cryptography support. All vector operations are implemented in C(++) and make use of bit-packing and SIMD, subprograms can be optimized and compiled to these operations in Python or C, and parallelization and pipelining are planned.
If your application is a relatively direct pipeline, take a look at HDCC. If you want a more stable library, or want to work with another base field than the booleans, use torch-hd.
The fundamental research includes finding algebras with interesting properties on top of large boolean vectors. To this extent the library has laws used for testing and an expansive set of operators including:
- Multiple types of fast random vector generation
- Random and indexed select between vectors
- Ability to slightly modify a vector, for example by flipping a fraction of its bits
- Permutation, roll, and swapping with multiple interfaces
- Hashing and encoding
- Majority with multiple implementation
- Sample, a cheap alternative to Majority
- AND, OR, XOR, and NOT operators
- Composite operations like SELECT (or MUX) and FLIP-FRAC (flipping a fraction of the bits)
- Hamming, jaccard, cosine, bit-error-rate, tversky, and mutual-information metrics
- A system for relatedness, unrelatedness, and standard deviations apart
- zscore and pvalue
Additionally, provided are
- A symbolic implementation with simplification, analysis, plotting and pretty printing
- A native C++ implementation
- Law and unification backed expression simplification
- Compilation to operation sequences (circuits)
- Efficient bit-packed representation (saves 8x memory compared to the traditional NumPy and PyTorch bool!)
- Three redundant implementations on NumPy for performance and correctness
- A (performant) plain Python implementation
- A minimal abstraction for permutations with caching and composition
- Very basic embeddings for other datatypes (more to come)
- Graph visualization of distances in hyperdimensional space (see example).
- Boolean expression and network synthesis (e.g. Cellular Automata and perfectly random functions)
- Visualization and storage via pbm (e.g. Cellular Automata)
- A normal form and conversions between different implementations and storage methods
- Linear and adiabatic variants and example
Make sure you have a recent Python version, 3.10 is recommended.
pip install bhv
If you only want to work with plain Python, you're good to go with from bhv.vanilla VanillaBHV as BHV
.
For the native option, you need a modern C++ compiler and use from bhv.native import NativePackedBHV as BHV
. The setup process should attempt to install this by default.
For interop with (the Python interface of) NumPy and PyTorch, you'll need
pip install numpy
or pip install torch
with respectively from bhv.np import NumPyPacked64BHV as BHV
or from bhv.np import TorchBoolBHV as BHV
.
Some resources to get started with the library, if you're looking for a broader intro, please take a look at hd-computing.com.
Basic uses (in the context of neo-GOFAI) are given in my presentation with a installation-free notebook.
The fundamental angle is to start is with Kanerva's initial paper together with the library. For that, multiple resources are provided:
- A notebook going over the very basics
- The grandmother example
- A Google Colab "reasoning by analogy" hosted notebook
- A guide to picking metrics
As for a Machine Learning angle, you may enjoy:
- A minimal implementation of the winnow algorithm on a minimal problem
- A minimal implementation of classification based on the majority operator on a minimal problem
- A Google Colab hosted digit classification via plain majority notebook
- Graph classification notebook re-implementing GraphHD
If you like to dive into the code directly, I suggest the following entrypoints:
- Finite State Machine example
- The base class AbstractBHV
- The most idiomatic implementation NumPyBoolBHV
Example exploratory usages of the library:
- Trying to improve upon linear hypervector search and its implementation
- Exact logic synthesis, the benchmarking code
- Cellular automata, the code (which produces pretty pictures), a talk on them in relation to VSA, and comments on the relation to the library
- Fiestal Cipher, the tests (for statistical properties), and the implementation
This repository is (highly) active development, and a work-in-progress. Do expect changes to the naming, and even features to be swapped for more elegant alternatives.
The codebase also works with PyPy. Use the vanilla Python implementation. The numeric operations are slower than on CPython, but the symbolic ones are way faster.
If you have any feedback, raise an informal issue, or email me at contact@adamv.be
If the library is not as fast as possible, that's a bug, please report.