| title | Installation Guide |
|---|---|
| parent | Getting Started |
| nav_order | 1 |
Complete setup instructions for TensorCraft-HPC across different use cases.
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 4 GB | 16 GB |
| Disk | 2 GB free | 10 GB free |
| GPU | NVIDIA (Compute 70+) | NVIDIA Hopper (SM 90) |
| Tool | Version | Purpose |
|---|---|---|
| CUDA Toolkit | 12.0+ | GPU kernel compilation and runtime |
| CMake | 3.20+ | Build system |
| C++ Compiler | C++17-capable | Host code compilation |
| NVIDIA Driver | 520+ | GPU runtime support |
| Tool | Version | Purpose |
|---|---|---|
| Python | 3.8+ | Python bindings |
| Ninja | 1.10+ | Faster build generation |
| GoogleTest | 1.10+ | Unit testing (auto-fetched) |
| pybind11 | 2.10+ | Python bindings (auto-fetched) |
# Update package lists
sudo apt update
# Install CUDA Toolkit (if not already installed)
# Option A: Official NVIDIA .deb repository
# See: https://developer.nvidia.com/cuda-downloads
# Option B: Direct installation
sudo apt install -y cuda-toolkit-12-8
# Install build tools
sudo apt install -y cmake build-essential
# Install Python (optional, for bindings)
sudo apt install -y python3 python3-pip python3-dev# Check CUDA
nvcc --version
# Should show: Cuda compilation tools, release 12.x
# Check CMake
cmake --version
# Should show: cmake version 3.20 or higher
# Check Python (if installed)
python3 --version
# Should show: Python 3.8 or higher# Clone repository
git clone https://github.com/LessUp/modern-ai-kernels.git
cd modern-ai-kernels
# Configure and build
cmake --preset dev
cmake --build --preset dev --parallel $(nproc)
# Run tests
ctest --preset dev --output-on-failure{: .warning } TensorCraft-HPC is a CUDA-based library. On macOS, you can only use CPU-only validation and documentation builds.
# Clone repository
git clone https://github.com/LessUp/modern-ai-kernels.git
cd modern-ai-kernels
# Install dependencies
brew install cmake
# CPU-only validation
cmake --preset cpu-smoke
cmake --build build/cpu-smoke --parallel $(sysctl -n hw.ncpu){: .warning } Windows requires Visual Studio and CUDA Toolkit installed. WSL2 is recommended.
# Install WSL2 with Ubuntu
wsl --install
# Inside WSL2, follow Ubuntu instructions above- Install Visual Studio 2022
- Install CUDA Toolkit 12.x
- Open Developer Command Prompt
cmake -B build -G "Visual Studio 17 2022" ^
-DCMAKE_CUDA_ARCHITECTURES=75 ^
-DTC_BUILD_TESTS=ON
cmake --build build --config Release --parallelTensorCraft-HPC provides several CMake presets for different use cases:
Use when: Daily development with CUDA support
cmake --preset dev
cmake --build --preset dev --parallel $(nproc)
ctest --preset dev --output-on-failureIncludes:
- ✅ All GPU kernels
- ✅ Unit tests
- ✅ Debug symbols
- ❌ Benchmarks (to save build time)
Use when: Focusing on Python bindings
cmake --preset python-dev
cmake --build --preset python-dev --parallel $(nproc)
python3 -m pip install -e .
python3 -c "import tensorcraft_ops as tc; print(tc.__version__)"Includes:
- ✅ Python bindings
- ✅ Core GPU kernels needed by Python API
- ❌ Full test suite
- ❌ Benchmarks
Use when: Complete build with benchmarks
cmake --preset release
cmake --build --preset release --parallel $(nproc)
ctest --test-dir build/release --output-on-failure
./build/release/benchmarks/gemm_benchmarkIncludes:
- ✅ Everything in
dev - ✅ Performance benchmarks
- ✅ Optimized build (RelWithDebInfo)
Use when: Debugging issues
cmake --preset debug
cmake --build --preset debug --parallel $(nproc)Includes:
- ✅ Full debug symbols
- ✅ No optimizations
- ✅ Runtime checks enabled
Use when: No CUDA available, testing build infrastructure
cmake --preset cpu-smoke
cmake --install build/cpu-smoke --prefix /tmp/tensorcraft-installIncludes:
- ✅ Build system validation
- ✅ Installation flow
- ❌ GPU features (disabled)
- ❌ Tests, benchmarks, Python bindings
For advanced users who need custom configurations:
cmake -B build/manual -G Ninja \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_CUDA_ARCHITECTURES=75 \
-DTC_BUILD_TESTS=ON \
-DTC_BUILD_BENCHMARKS=ON \
-DTC_BUILD_PYTHON=ON \
-DTC_PYTHON_EXECUTABLE=$(which python3)
cmake --build build/manual --parallel $(nproc)
ctest --test-dir build/manual --output-on-failure| Option | Default | Description |
|---|---|---|
CMAKE_CUDA_ARCHITECTURES |
75 | Target GPU architectures |
TC_BUILD_TESTS |
ON | Build unit tests |
TC_BUILD_BENCHMARKS |
ON (release only) | Build performance benchmarks |
TC_BUILD_PYTHON |
AUTO | Build Python bindings if Python found |
TC_PYTHON_EXECUTABLE |
auto-detected | Python executable path |
CUDA_TOOLKIT_ROOT_DIR |
/usr/local/cuda | CUDA installation path |
# From repository root
python3 -m pip install -e .import tensorcraft_ops as tc
# Check version
print(f"TensorCraft version: {tc.__version__}")
# Create tensors
a = tc.tensor([[1.0, 2.0], [3.0, 4.0]])
b = tc.tensor([[5.0, 6.0], [7.0, 8.0]])
# Matrix multiplication
c = tc.matmul(a, b)
print(f"Result: {c.numpy()}")| API | Description | Example |
|---|---|---|
tensor(data) |
Create tensor from list | tc.tensor([[1,2],[3,4]]) |
matmul(a, b) |
Matrix multiplication | tc.matmul(a, b) |
softmax(x) |
Softmax operation | tc.softmax(x, dim=-1) |
layer_norm(x) |
Layer normalization | tc.layer_norm(x) |
TensorCraft-HPC defaults to CUDA architecture 75 (Turing) for broad compatibility. Configure for your specific GPU:
| GPU Series | Architecture | SM Value |
|---|---|---|
| V100 | Volta | 70 |
| RTX 2000 | Turing | 75 |
| RTX 3000 / A100 | Ampere | 80 |
| RTX 4000 | Ada Lovelace | 89 |
| H100 | Hopper | 90 |
# Single architecture (faster builds)
cmake --preset dev -DCMAKE_CUDA_ARCHITECTURES=80
# Multiple architectures
cmake --preset dev -DCMAKE_CUDA_ARCHITECTURES="75;80;90"
# All supported architectures (slower builds, universal binaries)
cmake --preset dev -DCMAKE_CUDA_ARCHITECTURES="70;75;80;89;90"| Issue | Solution |
|---|---|
nvcc not found |
Install CUDA Toolkit or check PATH |
CUDA architecture mismatch |
Set CMAKE_CUDA_ARCHITECTURES |
CMake version too old |
Upgrade CMake to 3.20+ |
Python import fails |
Run python3 -m pip install -e . from repo root |
Tests fail on GPU |
Check GPU driver, run nvidia-smi |
For detailed troubleshooting, see Troubleshooting Guide.
After successful installation:
- Run Examples → Examples Section
- Learn Architecture → Architecture Guide
- Optimize Performance → Optimization Guide
- Reference API → API Documentation