title	Installation Guide
parent	Getting Started
nav_order	1

Installation Guide

Complete setup instructions for TensorCraft-HPC across different use cases.

System Requirements

Hardware Requirements

Component	Minimum	Recommended
RAM	4 GB	16 GB
Disk	2 GB free	10 GB free
GPU	NVIDIA (Compute 70+)	NVIDIA Hopper (SM 90)

Software Requirements

Required

Tool	Version	Purpose
CUDA Toolkit	12.0+	GPU kernel compilation and runtime
CMake	3.20+	Build system
C++ Compiler	C++17-capable	Host code compilation
NVIDIA Driver	520+	GPU runtime support

Optional

Tool	Version	Purpose
Python	3.8+	Python bindings
Ninja	1.10+	Faster build generation
GoogleTest	1.10+	Unit testing (auto-fetched)
pybind11	2.10+	Python bindings (auto-fetched)

Installation by Platform

Ubuntu / Debian Linux

1. Install Prerequisites

# Update package lists
sudo apt update

# Install CUDA Toolkit (if not already installed)
# Option A: Official NVIDIA .deb repository
# See: https://developer.nvidia.com/cuda-downloads

# Option B: Direct installation
sudo apt install -y cuda-toolkit-12-8

# Install build tools
sudo apt install -y cmake build-essential

# Install Python (optional, for bindings)
sudo apt install -y python3 python3-pip python3-dev

2. Verify Installation

# Check CUDA
nvcc --version
# Should show: Cuda compilation tools, release 12.x

# Check CMake
cmake --version
# Should show: cmake version 3.20 or higher

# Check Python (if installed)
python3 --version
# Should show: Python 3.8 or higher

3. Build TensorCraft-HPC

# Clone repository
git clone https://github.com/LessUp/modern-ai-kernels.git
cd modern-ai-kernels

# Configure and build
cmake --preset dev
cmake --build --preset dev --parallel $(nproc)

# Run tests
ctest --preset dev --output-on-failure

macOS (CPU-Only)

{: .warning } TensorCraft-HPC is a CUDA-based library. On macOS, you can only use CPU-only validation and documentation builds.

# Clone repository
git clone https://github.com/LessUp/modern-ai-kernels.git
cd modern-ai-kernels

# Install dependencies
brew install cmake

# CPU-only validation
cmake --preset cpu-smoke
cmake --build build/cpu-smoke --parallel $(sysctl -n hw.ncpu)

Windows

{: .warning } Windows requires Visual Studio and CUDA Toolkit installed. WSL2 is recommended.

Using WSL2 (Recommended)

# Install WSL2 with Ubuntu
wsl --install

# Inside WSL2, follow Ubuntu instructions above

Native Windows (Visual Studio)

Install Visual Studio 2022
Install CUDA Toolkit 12.x
Open Developer Command Prompt

cmake -B build -G "Visual Studio 17 2022" ^
  -DCMAKE_CUDA_ARCHITECTURES=75 ^
  -DTC_BUILD_TESTS=ON

cmake --build build --config Release --parallel

Build Presets Explained

TensorCraft-HPC provides several CMake presets for different use cases:

`dev` - Development (Recommended)

Use when: Daily development with CUDA support

cmake --preset dev
cmake --build --preset dev --parallel $(nproc)
ctest --preset dev --output-on-failure

Includes:

✅ All GPU kernels
✅ Unit tests
✅ Debug symbols
❌ Benchmarks (to save build time)

`python-dev` - Python Development

Use when: Focusing on Python bindings

cmake --preset python-dev
cmake --build --preset python-dev --parallel $(nproc)
python3 -m pip install -e .
python3 -c "import tensorcraft_ops as tc; print(tc.__version__)"

Includes:

✅ Python bindings
✅ Core GPU kernels needed by Python API
❌ Full test suite
❌ Benchmarks

`release` - Full Release

Use when: Complete build with benchmarks

cmake --preset release
cmake --build --preset release --parallel $(nproc)
ctest --test-dir build/release --output-on-failure
./build/release/benchmarks/gemm_benchmark

Includes:

✅ Everything in dev
✅ Performance benchmarks
✅ Optimized build (RelWithDebInfo)

`debug` - Debug Build

Use when: Debugging issues

cmake --preset debug
cmake --build --preset debug --parallel $(nproc)

Includes:

✅ Full debug symbols
✅ No optimizations
✅ Runtime checks enabled

`cpu-smoke` - CPU-Only Validation

Use when: No CUDA available, testing build infrastructure

cmake --preset cpu-smoke
cmake --install build/cpu-smoke --prefix /tmp/tensorcraft-install

Includes:

✅ Build system validation
✅ Installation flow
❌ GPU features (disabled)
❌ Tests, benchmarks, Python bindings

Manual Configuration

For advanced users who need custom configurations:

cmake -B build/manual -G Ninja \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DCMAKE_CUDA_ARCHITECTURES=75 \
  -DTC_BUILD_TESTS=ON \
  -DTC_BUILD_BENCHMARKS=ON \
  -DTC_BUILD_PYTHON=ON \
  -DTC_PYTHON_EXECUTABLE=$(which python3)

cmake --build build/manual --parallel $(nproc)
ctest --test-dir build/manual --output-on-failure

Key CMake Options

Option	Default	Description
`CMAKE_CUDA_ARCHITECTURES`	75	Target GPU architectures
`TC_BUILD_TESTS`	ON	Build unit tests
`TC_BUILD_BENCHMARKS`	ON (release only)	Build performance benchmarks
`TC_BUILD_PYTHON`	AUTO	Build Python bindings if Python found
`TC_PYTHON_EXECUTABLE`	auto-detected	Python executable path
`CUDA_TOOLKIT_ROOT_DIR`	/usr/local/cuda	CUDA installation path

Python Bindings

Installation

# From repository root
python3 -m pip install -e .

Verification

import tensorcraft_ops as tc

# Check version
print(f"TensorCraft version: {tc.__version__}")

# Create tensors
a = tc.tensor([[1.0, 2.0], [3.0, 4.0]])
b = tc.tensor([[5.0, 6.0], [7.0, 8.0]])

# Matrix multiplication
c = tc.matmul(a, b)
print(f"Result: {c.numpy()}")

Available Python APIs

API	Description	Example
`tensor(data)`	Create tensor from list	`tc.tensor([[1,2],[3,4]])`
`matmul(a, b)`	Matrix multiplication	`tc.matmul(a, b)`
`softmax(x)`	Softmax operation	`tc.softmax(x, dim=-1)`
`layer_norm(x)`	Layer normalization	`tc.layer_norm(x)`

CUDA Architecture Configuration

TensorCraft-HPC defaults to CUDA architecture 75 (Turing) for broad compatibility. Configure for your specific GPU:

Find Your GPU Architecture

GPU Series	Architecture	SM Value
V100	Volta	70
RTX 2000	Turing	75
RTX 3000 / A100	Ampere	80
RTX 4000	Ada Lovelace	89
H100	Hopper	90

Configure for Your GPU

# Single architecture (faster builds)
cmake --preset dev -DCMAKE_CUDA_ARCHITECTURES=80

# Multiple architectures
cmake --preset dev -DCMAKE_CUDA_ARCHITECTURES="75;80;90"

# All supported architectures (slower builds, universal binaries)
cmake --preset dev -DCMAKE_CUDA_ARCHITECTURES="70;75;80;89;90"

Troubleshooting Quick Reference

Issue	Solution
`nvcc not found`	Install CUDA Toolkit or check PATH
`CUDA architecture mismatch`	Set `CMAKE_CUDA_ARCHITECTURES`
`CMake version too old`	Upgrade CMake to 3.20+
`Python import fails`	Run `python3 -m pip install -e .` from repo root
`Tests fail on GPU`	Check GPU driver, run `nvidia-smi`

For detailed troubleshooting, see Troubleshooting Guide.

Next Steps

After successful installation:

Run Examples → Examples Section
Learn Architecture → Architecture Guide
Optimize Performance → Optimization Guide
Reference API → API Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation Guide

System Requirements

Hardware Requirements

Software Requirements

Required

Optional

Installation by Platform

Ubuntu / Debian Linux

1. Install Prerequisites

2. Verify Installation

3. Build TensorCraft-HPC

macOS (CPU-Only)

Windows

Using WSL2 (Recommended)

Native Windows (Visual Studio)

Build Presets Explained

`dev` - Development (Recommended)

`python-dev` - Python Development

`release` - Full Release

`debug` - Debug Build

`cpu-smoke` - CPU-Only Validation

Manual Configuration

Key CMake Options

Python Bindings

Installation

Verification

Available Python APIs

CUDA Architecture Configuration

Find Your GPU Architecture

Configure for Your GPU

Troubleshooting Quick Reference

Next Steps

Need Help?

FilesExpand file tree

installation.md

Latest commit

History

installation.md

File metadata and controls

Installation Guide

System Requirements

Hardware Requirements

Software Requirements

Required

Optional

Installation by Platform

Ubuntu / Debian Linux

1. Install Prerequisites

2. Verify Installation

3. Build TensorCraft-HPC

macOS (CPU-Only)

Windows

Using WSL2 (Recommended)

Native Windows (Visual Studio)

Build Presets Explained

dev - Development (Recommended)

python-dev - Python Development

release - Full Release

debug - Debug Build

cpu-smoke - CPU-Only Validation

Manual Configuration

Key CMake Options

Python Bindings

Installation

Verification

Available Python APIs

CUDA Architecture Configuration

Find Your GPU Architecture

Configure for Your GPU

Troubleshooting Quick Reference

Next Steps

Need Help?

`dev` - Development (Recommended)

`python-dev` - Python Development

`release` - Full Release

`debug` - Debug Build

`cpu-smoke` - CPU-Only Validation