This project implements key high-frequency trading (HFT) components in CUDA to demonstrate GPU acceleration for financial applications.
- Order Book - A limit order book implementation with matching engine - Learn more
- Parallel Sort - High-performance sorting algorithms for market data - Learn more
- Zero-Copy Market Data Processing - Low-latency market data feed processing using zero-copy memory techniques - Learn more
Each component contains its own detailed README explaining the implementation details, algorithmic approaches, and performance characteristics.
- NVIDIA GPU with Compute Capability 5.0 or higher
- Recommended: 4GB+ VRAM
-
CUDA Toolkit 11.0+
- Download from NVIDIA CUDA Downloads
- Follow installation instructions for your platform
- Verify installation with
nvcc --version
-
CMake 3.18+
- Download from CMake Website
- Add to system PATH during installation
-
Visual Studio 2022 (Windows) or GCC 7+ (Linux)
- For Windows: Install Visual Studio with "Desktop development with C++" workload
- For Linux:
sudo apt-get install build-essential
(Ubuntu/Debian)
-
Visual Studio Code (Optional but recommended)
- Install the "C/C++" and "CMake Tools" extensions
-
Open the project folder in VS Code
-
VS Code should automatically detect the CMake configuration if not then press
Ctrl+Shift+P
and selectCmake:Configure
Configure the project using CMake: -
Select your compiler kit when prompted:
-
After succesful configuration it will create some build files:
-
Build the project using F7 or the CMake Build button:
-
The executables will be created in the build/Debug directory:
# Create build directory
mkdir build && cd build
# Configure with CMake
cmake ..
# Build
cmake --build .
# From build directory
./Debug/order_book.exe
# If you have GPU limitation
./Debug/order_book.exe [num_orders]
Example output:
# From build directory
./Debug/parallel_sort.exe
# If you have GPU limitation
./Debug/parallel_sort.exe [data_size]
Example output:
# From build directory
./Debug/zero_copy_processor.exe
# If you have GPU limitation
./Debug/zero_copy_processor.exe [data_size]
Example output:
The benchmarks demonstrate significant performance improvements when using GPU acceleration:
-
Order Book:
- CPU Implementation: ~380,000 orders/sec
- GPU Implementation: Improved throughput for large order sets
-
Sorting Algorithms:
- CPU Sort: Baseline performance
- Thrust Sort: 2-3x faster than CPU
- Bitonic Sort: Variable performance based on data size
-
CUDA Not Found
- Ensure CUDA is installed and in your PATH
- Check environment variables:
CUDA_PATH
should be set - Restart your system after installation
-
Compilation Errors
- Verify CUDA Toolkit and VS/GCC versions are compatible
- Update graphics drivers to latest version
-
CMake Configuration Issues
- If you see warnings about FindCUDA being removed, you can safely ignore them
- Make sure your CMake version is 3.18 or higher
├── CMakeLists.txt # Project configuration
├── order-book/ # Order book implementation
│ ├── main.cu # Entry point
│ ├── order_book.cu # Implementation
│ └── order_book.cuh # Header
├── parallel-sort/ # Sorting implementations
│ ├── main.cu # Entry point
│ ├── parallel_sort.cu # Implementation
│ └── parallel_sort.cuh # Header
└── screenshots/ # Documentation images
This project is licensed under the MIT License - see the LICENSE file for details.