Daily Perf Improver - Add comprehensive matrix operation benchmarks #20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive benchmarking coverage for matrix operations as part of Phase 1 (Quick Wins) of the performance improvement plan. This establishes baseline performance metrics for all core matrix operations.
Performance Goal
Goal Selected: Add comprehensive matrix operation benchmarks (Phase 1, Priority: HIGH)
Rationale: The research plan identified that while vector operations had benchmarks, matrix operations had no benchmarking coverage. This PR fills that critical gap by adding 14 comprehensive benchmarks covering:
Changes Made
New Benchmarks Added
All benchmarks test three matrix sizes (10x10, 50x50, 100x100) and use
MemoryDiagnoserto track allocations.Element-wise Operations:
ElementWiseAdd- SIMD-accelerated element-wise additionElementWiseSubtract- SIMD-accelerated element-wise subtractionElementWiseMultiply- SIMD-accelerated Hadamard productElementWiseDivide- SIMD-accelerated element-wise divisionScalar Operations:
5.
ScalarAdd- Add scalar to all matrix elements6.
ScalarMultiply- Multiply all matrix elements by scalarMatrix Multiplication:
7.
MatrixMultiply- Standard matrix-matrix multiplication (matmul)Matrix-Vector Operations:
8.
MatrixVectorMultiply- Matrix × vector (SIMD-optimized)9.
VectorMatrixMultiply- Row vector × matrix (SIMD-optimized)Structure Operations:
10.
Transpose- Block-based transpose (16x16 blocks)Access Patterns:
11.
GetRow- Extract a single row (contiguous memory)12.
GetCol- Extract a single column (strided access)Broadcast Operations:
13.
AddRowVector- Add row vector to all matrix rows (SIMD)14.
AddColVector- Add column vector to all matrix columns (SIMD)Files Modified
benchmarks/FsMath.Benchmarks/Matrix.fs- New benchmark classbenchmarks/FsMath.Benchmarks/FsMath.Benchmarks.fsproj- Added Matrix.fs to compilationbenchmarks/FsMath.Benchmarks/Program.fs- Registered MatrixBenchmarks classApproach
--job shortPerformance Measurements
Test Environment
Results Summary by Operation Type
Element-wise Operations (10x10)
All element-wise operations show excellent SIMD performance with ~70ns latency:
Scalar Operations (10x10)
Scalar operations are slightly faster than element-wise:
Matrix Multiplication Scaling
Shows expected O(n³) scaling:
Matrix-Vector Operations (100x100)
Access Pattern Comparison (100x100)
Detailed Results Table
Key Observations
Performance Bottlenecks Identified
From these benchmarks, we can identify Phase 2 optimization opportunities:
Replicating the Performance Measurements
To replicate these benchmarks:
Results will be saved to
BenchmarkDotNet.Artifacts/results/in multiple formats (GitHub MD, HTML, CSV).Testing
✅ All benchmarks compile successfully
✅ All 14 matrix benchmarks × 3 sizes = 42 benchmarks discovered
✅ All benchmarks execute without errors
✅ Existing tests still pass (132 tests)
✅ No performance report files included in commit
Next Steps
This PR establishes comprehensive baseline measurements for matrix operations. Based on these measurements, future work from the performance plan includes:
Phase 1 (remaining):
Phase 2 (algorithmic improvements):
Phase 3 (advanced optimizations):
Related Issues/Discussions
Commands Used
🤖 Generated with Claude Code