Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
main.cu		main.cu
parallel_sort.cu		parallel_sort.cu
parallel_sort.cuh		parallel_sort.cuh

README.md

Market Data Parallel Sort

This component demonstrates high-performance sorting of financial market data using different parallel algorithms and compares them with sequential CPU sorting.

Implementation Overview

The implementation provides three different sorting approaches:

CPU Sort: Standard sequential sort using STL
Thrust Sort: High-level GPU sorting using the Thrust library
Bitonic Sort: Custom low-level GPU sorting implementation

Technical Details

Market Data Structure

The core data being sorted is time-series market data with these properties:

Nanosecond-precision timestamps (critical for HFT)
Price and volume information
Trade and quote flags

Sorting Algorithms

CPU Sort

Uses standard std::sort with custom comparator
Provides baseline performance for comparison
O(n log n) complexity with optimized branch prediction

Thrust Sort

High-level abstraction using Thrust parallel algorithms
Automatically selects optimal GPU sorting implementation
Handles memory transfers implicitly
Balances programmer productivity and performance

Bitonic Sort

Custom implementation of the bitonic sort network
Highly parallelizable O(log² n) algorithm
Each comparison-swap operation runs on a separate CUDA thread
Explicitly handles power-of-2 padding requirements

Performance Analysis

In benchmark tests with 1 million data points:

CPU Sort: ~560ms
Thrust Sort: ~20ms (28× speedup)
Bitonic Sort: ~97ms (5.8× speedup)

Why GPU Sorting Outperforms CPU

Massive Parallelism: Thousands of comparisons happen simultaneously
Memory Bandwidth: GPUs have significantly higher memory bandwidth
Specialized Hardware: CUDA cores are optimized for data-parallel operations

Thrust vs. Bitonic Performance

Thrust generally outperforms our custom bitonic implementation due to:
- Advanced optimization techniques
- Radix sort for appropriate data types
- Memory transfer optimizations
Bitonic Sort demonstrates the core principles of GPU parallelism but with:
- More explicit control
- Educational value for understanding parallel sorting

Use Cases in HFT

Fast sorting of market data is critical for:

Time-Series Analysis: Processing market data in chronological order
Event Reconstruction: Recreating the exact sequence of market events
Backtesting: Historical simulations of trading strategies
Market Surveillance: Detecting irregular trading patterns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel-sort

parallel-sort

README.md

Market Data Parallel Sort

Implementation Overview

Technical Details

Market Data Structure

Sorting Algorithms

CPU Sort

Thrust Sort

Bitonic Sort

Performance Analysis

Why GPU Sorting Outperforms CPU

Thrust vs. Bitonic Performance

Use Cases in HFT

Files

parallel-sort

Directory actions

More options

Directory actions

More options

Latest commit

History

parallel-sort

Folders and files

parent directory

README.md

Market Data Parallel Sort

Implementation Overview

Technical Details

Market Data Structure

Sorting Algorithms

CPU Sort

Thrust Sort

Bitonic Sort

Performance Analysis

Why GPU Sorting Outperforms CPU

Thrust vs. Bitonic Performance

Use Cases in HFT