High-performance Sobel edge detection using CUDA with CPU vs GPU benchmarking, roofline analysis, and Nsight profiling.
-
Updated
Jan 17, 2026 - Python
High-performance Sobel edge detection using CUDA with CPU vs GPU benchmarking, roofline analysis, and Nsight profiling.
CLI wrapper and Claude Code / Codex skill for NVIDIA Nsight Graphics 2026.1+ — capture GPU frames, export GPU Trace, and drill into bottlenecks via compact JSON.
CUDA Samples and Nsight Guided Profiling Samples
🎬 Explore GPU training efficiency with FP32 vs FP16 in this modular lab, utilizing Tensor Core acceleration for deep learning insights.
Quantum workload planning and profiler-backed architecture analysis for exact tensor-network execution.
A reproducible GPU benchmarking lab that compares FP16 vs FP32 training on MNIST using PyTorch, CuPy, and Nsight profiling tools. This project blends performance engineering with cinematic storytelling—featuring NVTX-tagged training loops, fused CuPy kernels, and a profiler-driven README that narrates the GPU’s inner workings frame by frame.
Add a description, image, and links to the nsight topic page so that developers can more easily learn about it.
To associate your repository with the nsight topic, visit your repo's landing page and select "manage topics."