Skip to content

A C benchmark suite and Python visualizer to measure CPU cache performance via sequential vs. random access, memory strides, AoS vs. SoA, and pointer chasing. Detects system cache levels to graph latency impacts on Linux, macOS, and Windows.

Notifications You must be signed in to change notification settings

Siddigz/CPU-Cache-Benchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CPU Cache Benchmark Visualizer

A comprehensive CPU cache performance benchmarking suite that measures cache behavior across different memory access patterns. The benchmarks are implemented in C for precise memory control, with Python visualization tools for analysis.

Features

  • 4 Core Benchmarks:

    • Sequential vs Random Access - Demonstrates spatial locality and why sequential memory accesses are faster
    • Stride Access - Shows how cache lines affect performance when skipping memory locations
    • Array of Structs vs Struct of Arrays - Demonstrates how data layout influences cache efficiency
    • Pointer Chasing - Highlights cache misses when following linked structures
  • Smart Visualization

    • Automatic Hardware Detection - Automatically detects your CPU model, L1/L2/L3 cache sizes, and OS
    • Statistical Analysis - Calculates mean, median, and standard deviation for performance metrics
    • Annotated Plots - Visualizes cache boundaries directly on performance graphs
  • Flexible & Portable

    • Cross-Platform - Works on Windows, Linux, and macOS
    • Parameterized Tests - Vary array sizes, strides, iterations, and access patterns
    • Multiple Output Formats - CSV and JSON output for flexible analysis

Example Results

The following plots were generated on AMD Ryzen 7 6800HS with Radeon Graphics | Debian GNU/Linux 13 (trixie) x86_64:

Data Layout (AoS vs SoA)

AoS vs SoA Compares the performance impact of different data layouts for structured data.

Sequential vs Random Access

Sequential vs Random Shows the massive performance gap between sequential access (cache-friendly) and random access (cache-thrashing).

Stride Access

Stride Access Illustrates how performance degrades as the stride increases and spatial locality is lost.

Pointer Chasing

Pointer Chasing Demonstrates the latency cost of pointer dereferences, with clear performance steps at L1, L2, and L3 cache boundaries.

Quick Start

1. Build

# Windows (no make)
gcc -O2 -Wall -std=c11 -o benchmark.exe benchmark.c

# Linux/macOS
make
# or: gcc -O2 -Wall -std=c11 -o benchmark benchmark.c -lrt

2. Run Benchmarks

Run the comprehensive suite to generate data across all sizes and patterns:

./benchmark --comprehensive --output results.csv

3. Visualize

Generate plots from your results:

# Create and activate virtual environment (recommended)
# Windows
python -m venv venv
venv\Scripts\activate

# Linux/macOS
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Generate plots
# Windows
python visualize.py results.csv --output plots

# Linux/macOS
python3 visualize.py results.csv --output plots

This will create plots_sequential.png, plots_pointer_chasing.png, etc.

Detailed Usage

CLI Options (benchmark)

Usage: ./benchmark [OPTIONS]

Options:
  --benchmark <name>    Benchmark to run: sequential, stride, aos-soa, pointer-chasing, all (default: all)
  --size <bytes>        Array size in bytes (default: 1048576 = 1MB)
  --stride <n>          Stride value for stride benchmark (default: 1)
  --iterations <n>      Number of iterations (default: 1000000)
  --format <csv|json>   Output format (default: csv)
  --output <file>       Write output to file (default: stdout)
  --comprehensive       Run comprehensive test suite with multiple sizes/strides
  --help                Show this help message

Visualization Options (visualize.py)

Usage: python visualize.py input_file [OPTIONS]

Arguments:
  input_file            Input CSV or JSON file with benchmark results

Options:
  --benchmark <name>    Which benchmark to visualize (default: all)
  --output <file>       Output file prefix for plots (default: display interactively)

Benchmark Explanations

1. Sequential vs Random Access

What it measures: Spatial locality and cache line utilization.

  • Sequential: Accesses array elements in order (0, 1, 2, 3...). Maximizes cache line reuse.
  • Random: Accesses elements in shuffled order. Causes frequent cache misses.

2. Stride Access

What it measures: How cache efficiency drops as memory access "skips" over data.

  • Accesses elements with a fixed stride (1, 2, 4, 8, 16...).
  • Larger strides mean fewer useful data items are loaded per cache line fetch.

3. AoS vs SoA (Data Layout)

What it measures: The impact of data structure layout on cache efficiency.

  • AoS (Array of Structs): struct {int x, y, z;} points[N]; - Good for accessing all fields of one object.
  • SoA (Struct of Arrays): int x[N], y[N], z[N]; - Better for SIMD and accessing single fields across many objects.

4. Pointer Chasing

What it measures: Pure latency of memory accesses (pointer walking).

  • Creates a linked list with nodes scattered randomly in memory.
  • Walking the list requires waiting for each memory fetch to complete before knowing the address of the next node.
  • This serial dependency makes it extremely sensitive to latency and practically eliminates instruction-level parallelism.

Technical Details

  • Timing: Uses high-resolution platform APIs (QueryPerformanceCounter on Windows, clock_gettime on POSIX).
  • Compilation: -O2 optimization level ensures realistic code generation while preventing the compiler from optimizing away the memory accesses entirely (variables are marked volatile where necessary).
  • Hardware Detection: Python script uses platform, subprocess (PowerShell on Windows, sysctl on macOS, /sys on Linux) to detect exact hardware specifications.

License

This project is open source and provided for educational and research purposes.

About

A C benchmark suite and Python visualizer to measure CPU cache performance via sequential vs. random access, memory strides, AoS vs. SoA, and pointer chasing. Detects system cache levels to graph latency impacts on Linux, macOS, and Windows.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 56.7%
  • C 42.4%
  • Makefile 0.9%