This project is still under heavy development and might not be production-ready
We encourage experimenting with it and reporting any issues you run into via
Github issues.
Halide is a programming language designed to make it easier to write high-performance image and array processing code on modern machines. Rather than being a standalone programming language, Halide is embedded in C++. This means you write C++ code that builds an in-memory representation of a Halide pipeline using Halide's C++ API. You can then compile this representation to an object file, or JIT-compile it and run it in the same process. |
This package provides Haskell bindings that allow to write Halide embedded in Haskell without C++ 😋.
As a simple example, here's how you could implement array addition with halide-haskell:
{-# LANGUAGE AllowAmbiguousTypes, DataKinds, OverloadedStrings, ViewPatterns #-}
import Language.Halide
-- The algorithm
mkArrayPlus = compile $ \(buffer "a" -> a) (buffer "b" -> b) -> do
-- Create an index variable
i <- mkVar "i"
-- Define the resulting function. We call it "out".
-- In pseudocode it's equivalent to the following: out[i] = a[i] + b[i]
out <- define "out" i $ a ! i + b ! i
-- Perform a fancy optimization and use SIMD: we split the loop over i into
-- an inner and an outer loop and then vectorize the inner loop
inner <- mkVar "inner"
split TailAuto i (i, inner) 4 out >>= vectorize inner
-- Example usage of our Halide pipeline
main :: IO ()
main = do
let a, b :: [Float]
a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
-- Compile the code
arrayPlus <- mkArrayPlus
-- We tell Halide to treat our list as a one-dimensional buffer
withHalideBuffer @1 @Float a $ \a' ->
withHalideBuffer b $ \b' ->
-- allocate a temporary buffer for the output
allocaCpuBuffer [length a] $ \out' -> do
-- execute the kernel -- it is a normal function call!
arrayPlus a' b' out'
-- print the result
print =<< peekToList out'
For more examples, have a look at the tutorials.
The library is avaiable on Hackage, so you can just add halide-haskell
to the
build-depends
field in your Cabal file. However, halide-haskell depends on Halide,
so you need to make sure it's available.
No need to do anything 😃, Nix will make Halide avaiable automatically.
If you just want to hack on the library, after git clone
ing, type
nix build
and to run an example, try
nix run
nix run .#ghc928-intel-ocl.halide-haskell # for Intel OpenCL support
nix run .#ghc928-cuda.halide-haskell # for CUDA support
nix run .#ghc945.halide-haskell # to build with GHC 9.4.5 instead
nix run .#ghc945.halide-tutorial01 # to run the first tutorial
This set up is not tested in CI... but
- Download a pre-built Halide for your system from their releases page.
- Pass
--extra-include-dirs
and--extra-lib-dirs
options to Cabal on the command line or add them to yourcabal.project.local
file.
The availability of Deep Learning frameworks such as PyTorch or JAX has revolutionized array processing, independently of whether one works on Machine Learning tasks or other numerical algorithms. The ecosystem in Haskell has been catching up as well, and there are now multiple good array libraries (hmatrix, massiv, Accelerate, arrayfire-haskell, Hasktorch, are all high-quality libraries). To accommodate multiple domains, such libraries have to support hundreds if not thousands of operations (e.g. there are more than 3.5 thousand of so called “native” functions in PyTorch), and this count does not include specializations for different device and data types).
To overcome this difficulty, we propose to build a common extension mechanism for Haskell array libraries. The mechanism is based on embedding the Halide language into Haskell that allows to just-in-time (JIT) compile computational kernels for various hardware.
One might wonder, why write another package instead of relying on Accelerate for the JIT compilation of the kernels. Accelerate is a Haskell eDSL (embedded Domain Specific Language) for collective operations on dense multi-dimensional arrays. It relies on LLVM to JIT compile the computational kernels for the target architecture including multicore CPUs and GPUs. Users have to rely on Accelerate to generate high-performance kernels and have no way to force some low-level optimizations. For example, Trevor L. McDonell et al. explain that the reason why hand-written CUDA implementation of the N-body problem outperforms Accelerate is the use of on-chip shared memory. Another example would be the matrix-matrix product where achieving maximal performance requires writing no fewer than six nested loops instead of the naive three (ACM Trans. Math. Softw. 34, 3, Article 12 (May 2008)). Accelerate has no way of knowing that such optimizations have to be applied and cannot perform them automatically, and this is precisely the gap that we are trying to fill by embedding Halide into Haskell.
Halide is a C++ eDSL for high-performance image and array processing. Its core idea is to decouple the algorithm (i.e. what is computed) from the schedule (i.e. where and when it is computed). The eDSL allows to quickly prototype and test the algorithm and then move on to the optimization. Optimizations such as fusion, tiling, parallelism and vectorization can be freely explored without the risk of breaking the original algorithm definition. Schedulers can also be generated automatically by advanced optimization algorithms
Halide provides a lower level interface than Accelerate and thus does not aim to replace it. Instead, Halide can be used to extend Accelerate, and later on one might even think about using Halide as a backend for Accelerate.
Currently, the best way to get started is to use Nix:
nix develop
This will drop you into a shell with all the necessary tools to build the code such that you can do
cabal build
and
cabal test