A Powerful, Superfast Multidimensional Tensor Library for C/C++
๐ Documentation โข ๐ Quick Start โข ๐ค Contributing
- High Performance: Optimized for speed with SIMD and multi-threading support
- GPU Acceleration: Full CUDA support for GPU computing (CUDA, XPU, NPU, TPU)
- Simple API: Clean, intuitive C and C++ interfaces
- Comprehensive: All essential tensor operations for scientific computing
- Production Ready: Thoroughly tested, documented, and battle-tested
- Cross-Platform: Works on Windows, Linux, and macOS
- Zero Dependencies: Core library has no external dependencies
- Memory Efficient: Smart memory management with minimal overhead
Download the latest release for your platform from GitHub Releases:
- Linux:
tensr-linux-x64.tar.gz - Windows:
tensr-windows-x64.zip - macOS:
tensr-macos-x64.tar.gz
Extract and copy to your system:
Linux/macOS:
tar -xzf tensr-linux-x64.tar.gz
sudo cp -r lib/* /usr/local/lib/
sudo cp -r include/* /usr/local/include/Windows:
Extract the zip file and add the lib and include directories to your project paths.
For xmake users, download tensr-xmake-{version}.tar.gz from releases:
tar -xzf tensr-xmake-0.0.0.tar.gz
cd tensr-xmake-0.0.0
xmake installOr add to your xmake.lua:
add_includedirs("/path/to/tensr/include")
add_linkdirs("/path/to/tensr/lib")
add_links("tensr")git clone https://github.com/muhammad-fiaz/tensr.git
cd tensr
xmake build
xmake install#include <tensr/tensr.h>
#include <tensr/tensr_array.h>
int main() {
/* Create arrays from data (like np.array) */
float data_a[] = {1, 2, 3};
float data_b[] = {4, 5, 6};
size_t shape[] = {3};
Tensor* a = tensr_from_array(shape, 1, TENSR_FLOAT32, TENSR_CPU, data_a);
Tensor* b = tensr_from_array(shape, 1, TENSR_FLOAT32, TENSR_CPU, data_b);
/* Element-wise operations */
Tensor* sum = tensr_add(a, b); /* a + b */
Tensor* product = tensr_mul(a, b); /* a * b */
Tensor* squared = tensr_pow(a, 2.0); /* a ** 2 */
/* Print results */
tensr_print(sum);
tensr_print(product);
tensr_print(squared);
/* Cleanup */
tensr_free(a);
tensr_free(b);
tensr_free(sum);
tensr_free(product);
tensr_free(squared);
return 0;
}#include <tensr/tensr.hpp>
int main() {
/* Create tensors */
auto t1 = tensr::Tensor::ones({3, 3});
auto t2 = tensr::Tensor::rand({3, 3});
/* Perform operations */
auto result = t1 + t2;
auto sum = result.sum();
/* Print result */
result.print();
return 0;
}zeros()- Create tensor filled with zerosones()- Create tensor filled with onesfull()- Create tensor filled with a valuearange()- Create tensor with evenly spaced valueslinspace()- Create tensor with linearly spaced valueseye()- Create identity matrixrand()- Create tensor with random valuesrandn()- Create tensor with normal distribution
add(),sub(),mul(),div()- Element-wise operationspow(),sqrt(),exp(),log()- Mathematical functionssin(),cos(),tan()- Trigonometric functionsabs(),neg()- Unary operations
dot()- Dot productmatmul()- Matrix multiplicationinv()- Matrix inversedet()- Determinantsvd()- Singular value decompositioneig()- Eigenvalues and eigenvectors
sum()- Sum of elementsmean()- Mean of elementsmax(),min()- Maximum and minimumargmax(),argmin()- Indices of max/min
reshape()- Change tensor shapetranspose()- Transpose dimensionssqueeze()- Remove single dimensionsexpand_dims()- Add dimensionsconcat(),stack()- Combine tensors
Tensr supports multiple accelerator backends:
/* CUDA GPU */
Tensor* t = tensr_zeros(shape, 2, TENSR_FLOAT32, TENSR_CUDA);
/* Transfer between devices */
tensr_to_device(t, TENSR_CUDA, 0);Supported devices:
- CPU: Standard CPU execution
- CUDA: NVIDIA GPU acceleration
- XPU: Intel XPU support
- NPU: Neural Processing Unit
- TPU: Tensor Processing Unit
Tensr is designed for maximum performance:
- SIMD Optimizations: Vectorized operations for CPU
- GPU Acceleration: CUDA kernels for parallel computing
- Memory Efficiency: Minimal allocations and smart caching
- Multi-threading: Parallel execution for large tensors
Run the test suite:
xmake build tests
xmake run testsAll tests must pass before release.
Full documentation is available at https://muhammad-fiaz.github.io/Tensr/
- Use appropriate data types for your use case
- Free tensors when done to avoid memory leaks
- Check return values for NULL
- Use GPU for large-scale computations
- Profile your code for bottlenecks
- Don't mix tensors from different devices without transfer
- Don't modify tensor data directly
- Don't forget to synchronize after GPU operations
- Don't use debug builds in production
- Don't ignore compiler warnings
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Muhammad Fiaz
- GitHub: @muhammad-fiaz
- Email: contact@muhammadfiaz.com
Special thanks to all contributors and the open-source community.
- ๐ง Email: contact@muhammadfiaz.com
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
Found a bug? Please open an issue on GitHub.