Skip to content

chillenb/pytblis

Repository files navigation

pytblis: Python bindings for TBLIS

Actions Status GitHub Discussion

Are your einsums too slow?

Need FP64 tensor contractions and can't buy a datacenter GPU because you already maxed out your home equity line of credit?

Set your CPU on fire with TBLIS!

Usage

pytblis.einsum and pytblis.tensordot are drop-in replacements for numpy.einsum and numpy.tensordot.

In addition, low level wrappers are provided for tblis_tensor_add, tblis_tensor_mult, tblis_tensor_reduce, tblis_tensor_shift, and tblis_tensor_dot. These are named pytblis.add, pytblis.mult, et cetera.

Finally, there are mid-level convenience wrappers for tblis_tensor_mult and tblis_tensor_add:

def contract(
    subscripts: str,
    a: ArrayLike,
    b: ArrayLike,
    alpha: scalar = 1.0,
    beta: scalar = 0.0,
    out: Optional[npt.ArrayLike] = None,
    conja: bool = False,
    conjb: bool = False,
) -> ArrayLike

and

def transpose_add(
    subscripts: str,
    a: ArrayLike,
    alpha: scalar = 1.0,
    beta: scalar = 0.0,
    out: Optional[ArrayLike] = None,
    conja: bool = False,
    conjout: bool = False,
) -> ArrayLike

These are used as follows:

C = pytblis.contract("ij,jk->ik", A, B, alpha=1.0, beta=0.5, out=C, conja=True, conjb=False)

does

$$C \gets \overline{A} B + \frac{1}{2} C.$$

B = pytblis.tensor_add("iklj->ijkl", A, alpha=-1.0, beta=1.0, out=B)

does

$$B_{ijkl} \gets B_{ijkl} - A_{iklj}.$$

Some additional documentation (work in progress) is available at pytblis.readthedocs.io.

Limitations

Supported datatypes: np.float32, np.float64, np.complex64, np.complex128. Mixing arrays of different precisions isn't yet supported.

New features

Mixed-complex/real contractions

New in version v0.0.11: pytblis.contract fully supports contractions between complex and/or real tensors of the same floating point precision, provided that alpha and beta are both real. This just contracts the real and imaginary parts separately with TBLIS. As of v0.0.14, this feature is enabled by default. It can be turned off in pytblis.contract and pytblis.einsum by passing complex_real_contractions=False.

Installation

I will try to get this package added to conda-forge. In the meantime, conda packages may be downloaded from my personal channel.

conda install pytblis -c conda-forge -c chillenb

The pre-built Mac OS wheels on PyPI use pthreads for multithreading. The Linux wheels now use OpenMP, which is much more efficient. Mac users who want OpenMP should install from source or use the conda packages.

pip install pytblis (not as performant)

About OpenBLAS

Don't use OpenBLAS configured with pthreads. It causes oversubscription when used with other multithreaded libraries, in particular anything that uses OpenMP. Instead, use MKL (libblas=*=*mkl) or the OpenMP variant of OpenBLAS (libopenblas=*=*openmp*).

Installation from source

the easy way:

pip install --no-binary pytblis pytblis

The default compile options will give good performance. OpenMP is the default thread model when building from source. You can pass additional options to CMake via CMAKE_ARGS, change the thread model, compile for other CPU microarchitectures, etc.

the hard way:

  1. Install TBLIS.
  2. Run CMAKE_ARGS="-DTBLIS_ROOT=wherever_tblis_is_installed" pip install .

See dev_install.sh for an example. This script installs TBLIS in ./local_tblis_prefix and then links pytblis against it.

Research

If you use TBLIS in your academic work, it's a good idea to cite:

TBLIS is not my work, and its developers are not responsible for flaws in these Python bindings.

Acknowledgements

The implementation of einsum and the tests are modified versions of those from opt_einsum.

pytblis was developed in the Zhu Group, Department of Chemistry, Yale University.

About

Python bindings for TBLIS

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors