Skip to content

nbathreya/CUDA-Signal-Processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU-Accelerated Signal Processing

Implementation of signal detection algorithms on CUDA. Demonstrates 10-50x speedup over CPU for large-scale time-series analysis.

Python PyTorch License

Algorithms

CUSUM Detection - Cumulative sum for change-point detection
Threshold Detection - Adaptive baseline with parallel computation
Streaming Processing - Memory-efficient chunked processing

Performance

Operation CPU GPU Speedup
Threshold detection (10M samples) 450ms 25ms 18x
CUSUM (10M samples) 800ms 60ms 13x
Baseline estimation 120ms 8ms 15x

Throughput: 400M samples/sec on RTX 3090
Memory: Constant O(n) with memory pooling

Installation

pip install cupy-cuda11x numpy

# Verify CUDA
python -c "import cupy; print(cupy.cuda.runtime.getDeviceCount())"

Usage

from cuda_signal_processor import CUDASignalProcessor, DetectionConfig

config = DetectionConfig(threshold_sigma=3.0)
processor = CUDASignalProcessor(config)

# Benchmark
results = processor.benchmark(signal_length=10_000_000)
print(f"Speedup: {results['speedup']:.1f}x")

# Process signal
signal = np.random.randn(10_000_000)
events, elapsed = processor.threshold_detect_gpu(signal)

Architecture

CPU                          GPU
┌──────────┐                ┌──────────┐
│ Signal   │──transfer─────►│ Device   │
│ (NumPy)  │                │ Memory   │
└──────────┘                └──────────┘
                                  │
                            ┌─────▼──────┐
                            │ Parallel   │
                            │ Kernels    │
                            └─────┬──────┘
                                  │
┌──────────┐                ┌─────▼──────┐
│ Results  │◄───transfer────│ Results    │
│ (NumPy)  │                │ (GPU)      │
└──────────┘                └────────────┘

References

  • Page, E.S. (1954). "Continuous Inspection Schemes"
  • Basseville & Nikiforov (1993). "Detection of Abrupt Changes"

Benchmarks

python cuda_signal_processor.py

Expected output:

=== GPU Accelerated Signal Processing ===
Signal length: 10,000,000 samples
GPU time: 24.56 ms
CPU time: 445.32 ms
Speedup: 18.1x
Throughput: 407.1 M samples/sec
GPU memory: 152.3 MB

Future

  • Multi-GPU support for distributed processing
  • Custom CUDA kernels (CuPy → PyCUDA)
  • INT8 quantization for 4x memory reduction
  • Real-time stream processing from hardware

License: MIT | Python: 3.8+ | CUDA: 11.0+

About

GPU-Accelerated Signal Processing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages