A General-purpose Task-parallel Programming System using Modern C++
-
Updated
Apr 2, 2025 - C++
A General-purpose Task-parallel Programming System using Modern C++
CUDA Core Compute Libraries
Thin, unified, C++-flavored wrappers for the CUDA APIs
TinyChatEngine: On-Device LLM Inference Library
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
An implementation of HIP that works on CPUs, across OSes.
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
simple GPU ransac fitting of multiple lines on 2d/3d point cloud
Reconstruct mesh from point cloud data generated by 3D scanner
A simple ray-tracing program implemented with CUDA.
CUDA solutions for the lab assignments in the UIUC-ECE408 Applied Parallel Programming course.
Based on TensorRT v8.2, build network for YOLOv5-v5.0 by myself, speed up YOLOv5-v5.0 inferencing
Converts an RGB image to greyscale using parallel programming.
CUDA Programming Starter Kit for VSCode and CLion
Parallel LiDAR Point Cloud Preprocessing for Autonomous Driving Applications
Accelerated Optical Video Stabilizer, Cuda, OpenCL, Avx512
A simple image filter example for those who study GPU/CUDA programming
High-performance CUDA C++ implementation of Graph Convolutional Networks
A parallel and GPU-accelerated Code for Real-Space All-Electron Linear-Scaling Density Functional Theory
Add a description, image, and links to the cuda-programming topic page so that developers can more easily learn about it.
To associate your repository with the cuda-programming topic, visit your repo's landing page and select "manage topics."