-
-
Notifications
You must be signed in to change notification settings - Fork 2
Home
"Welcome to the Universal-Architecture-RNG-Lib wiki! I'm out to beat C++'s engineers at their own game and improve upon the std library implementation!"
A mission to create the fastest, most adaptive random number generator across all possible hardware architectures.
The Universal RNG Library represents an ambitious quest to surpass the performance of C++ standard library random number generators through intelligent runtime adaptation and cutting-edge SIMD optimization. This isn't just another RNG - it's a hardware-aware, performance-obsessed random number generation system that automatically selects the optimal implementation for your specific CPU and GPU capabilities.
Our priorities are crystal clear:
- 🏎️ SPEED - Outperform std::random_device and default C++ RNGs
- 🎲 RANDOMNESS QUALITY - Maintain scientific-grade statistical properties
- 💾 MEMORY EFFICIENCY - Minimal footprint with maximum throughput
// The library automatically detects and selects:
✅ AVX-512 (8-way parallelism) // Latest Intel/AMD CPUs
✅ AVX2 (4-way parallelism) // Modern CPUs
✅ SSE2 (2-way parallelism) // Legacy compatibility
✅ ARM NEON (2-way parallelism) // ARM processors
✅ OpenCL GPU (1024+ parallelism) // Massive throughput
✅ Scalar fallback // Universal compatibility
- 🎯 Xoroshiro128++ - The C++ standard's choice, optimized beyond recognition
- ⚡ WyRand - Superior randomness quality with exceptional speed
- Smart pointers (
std::unique_ptr,std::shared_ptr) - RAII memory management
- Template-based dispatch optimization
- Zero-overhead abstractions
git clone https://github.com/YOUR_USERNAME/universal-rng.git
cd universal-rng
Automatic detection and optimization
./simple_compiler.bat
Force specific SIMD (if you know your hardware)
./simple_compiler.bat avx512f avx512dq avx512bw avx512vl
#include "universal_rng.h"
int main() {
// Create RNG with automatic optimal implementation
universal_rng_t* rng = universal_rng_new(42, 0, 1);
// Single values - perfect for gaming
uint64_t random_id = universal_rng_next_u64(rng);
double probability = universal_rng_next_double(rng);
// Batch generation - perfect for scientific computing
std::vector<uint64_t> batch(1000000);
universal_rng_generate_batch(rng, batch.data(), batch.size());
universal_rng_free(rng);
}
Creating RNG...
CPU feature detection:
SSE2: Yes
AVX2: Yes
AVX512: Yes
Using AVX512 implementation
Batch generation: 4.2x speedup over scalar!
| Implementation | Single Gen Speed | Batch Speedup | Hardware Target |
|---|---|---|---|
| Scalar | 1.0x (baseline) | 1.0x | Any CPU |
| SSE2 | 1.2x | 2.1x | Intel/AMD 2001+ |
| AVX2 | 1.8x | 4.2x | Intel Haswell+ |
| AVX-512 | 2.3x | 8.1x | Intel Skylake-X+ |
| OpenCL GPU | 0.8x | 100x+ | Dedicated GPU |
- 🎯 Algorithms: 2 (Xoroshiro128++, WyRand)
- 🔧 SIMD Variants: 6 (Scalar, SSE2, AVX, AVX2, AVX-512, NEON)
- 🖥️ Platforms: Windows, Linux, macOS
- 📊 Bit Widths: 16, 32, 64, 128, 256, 512, 1024-bit
- ⚡ Max Parallelism: 1024+ streams (OpenCL)
- 🎮 Performance Gain: Up to 8x faster than std library
What started as a challenge to "beat C++'s own engineers at their own game" became an epic journey through:
- 🔍 CPU Feature Detection - Runtime intelligence across platforms
- ⚡ SIMD Optimization - From SSE2 to cutting-edge AVX-512
- 🎯 Modern C++ Refactoring - Smart pointers and RAII mastery
- 🖥️ GPU Acceleration - OpenCL integration for massive parallelism
- 📊 Comprehensive Benchmarking - Proving performance claims with data
The result? A library that doesn't just match standard implementations - it obliterates them while maintaining scientific-grade randomness quality.
git clone https://github.com/YOUR_USERNAME/universal-rng.git
cd universal-rng
./simple_compiler.bat
./enhanced_bitwidth_benchmark.exe
Welcome to the future of random number generation! 🚀✨
# Universal High-Performance RNG Library"I'm out to beat C++'s own engineers at their own game and improve upon the std library implementation!"
A mission to create the fastest, most adaptive random number generator across all possible hardware architectures.
The Universal RNG Library represents an ambitious quest to surpass the performance of C++ standard library random number generators through intelligent runtime adaptation and cutting-edge SIMD optimization. This isn't just another RNG - it's a hardware-aware, performance-obsessed random number generation system that automatically selects the optimal implementation for your specific CPU and GPU capabilities.
Our priorities are crystal clear:
- 🏎️ SPEED - Outperform std::random_device and default C++ RNGs
- 🎲 RANDOMNESS QUALITY - Maintain scientific-grade statistical properties
- 💾 MEMORY EFFICIENCY - Minimal footprint with maximum throughput
// The library automatically detects and selects:
✅ AVX-512 (8-way parallelism) // Latest Intel/AMD CPUs
✅ AVX2 (4-way parallelism) // Modern CPUs
✅ SSE2 (2-way parallelism) // Legacy compatibility
✅ ARM NEON (2-way parallelism) // ARM processors
✅ OpenCL GPU (1024+ parallelism) // Massive throughput
✅ Scalar fallback // Universal compatibility- 🎯 Xoroshiro128++ - The C++ standard's choice, optimized beyond recognition
- ⚡ WyRand - Superior randomness quality with exceptional speed
- Smart pointers (
std::unique_ptr,std::shared_ptr) - RAII memory management
- Template-based dispatch optimization
- Zero-overhead abstractions
git clone https://github.com/YOUR_USERNAME/universal-rng.git
cd universal-rng
# Automatic detection and optimization
./simple_compiler.bat
# Force specific SIMD (if you know your hardware)
./simple_compiler.bat avx512f avx512dq avx512bw avx512vl#include "universal_rng.h"
int main() {
// Create RNG with automatic optimal implementation
universal_rng_t* rng = universal_rng_new(42, 0, 1);
// Single values - perfect for gaming
uint64_t random_id = universal_rng_next_u64(rng);
double probability = universal_rng_next_double(rng);
// Batch generation - perfect for scientific computing
std::vector<uint64_t> batch(1000000);
universal_rng_generate_batch(rng, batch.data(), batch.size());
universal_rng_free(rng);
}Creating RNG...
CPU feature detection:
SSE2: Yes
AVX2: Yes
AVX512: Yes
Using AVX512 implementation
Batch generation: 4.2x speedup over scalar!
| Implementation | Single Gen Speed | Batch Speedup | Hardware Target |
|---|---|---|---|
| Scalar | 1.0x (baseline) | 1.0x | Any CPU |
| SSE2 | 1.2x | 2.1x | Intel/AMD 2001+ |
| AVX2 | 1.8x | 4.2x | Intel Haswell+ |
| AVX-512 | 2.3x | 8.1x | Intel Skylake-X+ |
| OpenCL GPU | 0.8x | 100x+ | Dedicated GPU |
Benchmarked on various hardware configurations - your results may vary
- Ultra-fast random number generation for procedural content
- Automatic hardware optimization without code changes
- Batch generation for particle systems and terrain generation
- High-quality randomness for Monte Carlo simulations
- Massive batch generation for statistical analysis
- Cross-platform consistency with optimal performance
- Replace standard library RNGs with faster alternatives
- Hardware-aware optimization in performance-critical applications
- Future-proof code that adapts to new instruction sets
| Section | Description |
|---|---|
| [🏗️ Architecture Overview](Architecture-Overview) | Deep dive into the runtime dispatch system |
| [⚡ SIMD Implementations](SIMD-Implementations) | Technical details of each optimization level |
| [🎯 Performance Analysis](Performance-Analysis) | Benchmark results and optimization stories |
| [🔧 Build System Guide](Build-System-Guide) | Compilation options and platform support |
| [📈 Development History](Development-History) | The epic journey from concept to reality |
| [🚀 Future Roadmap](Future-Roadmap) | Cryptographic security and multi-language plans |
| [🧪 API Reference](API-Reference) | Complete function documentation |
| [❓ FAQ & Troubleshooting](FAQ-Troubleshooting) | Common issues and solutions |
- 🎯 Algorithms: 2 (Xoroshiro128++, WyRand)
- 🔧 SIMD Variants: 6 (Scalar, SSE2, AVX, AVX2, AVX-512, NEON)
- 🖥️ Platforms: Windows, Linux, macOS
- 📊 Bit Widths: 16, 32, 64, 128, 256, 512, 1024-bit
- ⚡ Max Parallelism: 1024+ streams (OpenCL)
- 🎮 Performance Gain: Up to 8x faster than std library
What started as a challenge to "beat C++'s own engineers at their own game" became an epic journey through:
- 🔍 CPU Feature Detection - Runtime intelligence across platforms
- ⚡ SIMD Optimization - From SSE2 to cutting-edge AVX-512
- 🎯 Modern C++ Refactoring - Smart pointers and RAII mastery
- 🖥️ GPU Acceleration - OpenCL integration for massive parallelism
- 📊 Comprehensive Benchmarking - Proving performance claims with data
The result? A library that doesn't just match standard implementations - it obliterates them while maintaining scientific-grade randomness quality.
git clone https://github.com/YOUR_USERNAME/universal-rng.git
cd universal-rng
./simple_compiler.bat
./enhanced_bitwidth_benchmark.exeWelcome to the future of random number generation! 🚀✨
There is currently data lost off the bottom off the page - a search party needs to be sent in to rescue!
PLEASE DO BEAR IN CONSTANT MIND ABOVE ALL ELSE: CURRENT STATE OF DEVELOPMENT THE C++ STD LIBRARY EMPLOYING MERSENNE TWISTER STILL OUTPERFORMS SINGLE CALCULATION OPERATIONS FOR NON-SIMD BOOSTED COMPUTERS. THESE LIBRARIES FULLY REQUIRE AT LEAST AVX2 MINIMUM TO BENEFIT OVER THE STD GENERATION METHODS WHEN CONSIDERING SINGLE NUMBER GENERATION TASKS.