Releases: minio/md5-simd
Releases · minio/md5-simd
v1.1.2
Use scalar functions when less traffic (#24) Switch to scalar assembly when less than 3 lanes are filled. This brings us very close to `crypto/md5` in cases where only a single lane is populated. When there are 2 lanes filled we use 2 goroutines with the scalar code and above that we switch to SIMD. Before, with a single writer: ``` BenchmarkAvx2SingleWriter/32KB-32 14686 80893 ns/op 405.08 MB/s 976 B/op 8 allocs/op BenchmarkAvx2SingleWriter/64KB-32 7498 162843 ns/op 402.45 MB/s 1840 B/op 15 allocs/op BenchmarkAvx2SingleWriter/128KB-32 3636 327558 ns/op 400.15 MB/s 3568 B/op 29 allocs/op BenchmarkAvx2SingleWriter/256KB-32 1845 650406 ns/op 403.05 MB/s 7024 B/op 57 allocs/op BenchmarkAvx2SingleWriter/512KB-32 922 1295010 ns/op 404.85 MB/s 13937 B/op 113 allocs/op BenchmarkAvx2SingleWriter/1MB-32 463 2598272 ns/op 403.57 MB/s 27765 B/op 225 allocs/op BenchmarkAvx2SingleWriter/2MB-32 231 5164500 ns/op 406.07 MB/s 55411 B/op 449 allocs/op BenchmarkAvx2SingleWriter/4MB-32 100 10170000 ns/op 412.42 MB/s 110709 B/op 897 allocs/op BenchmarkAvx2SingleWriter/8MB-32 56 20357161 ns/op 412.07 MB/s 221305 B/op 1793 allocs/op ``` After: ``` BenchmarkAvx2SingleWriter/32KB-32 26785 44353 ns/op 738.80 MB/s 112 B/op 1 allocs/op BenchmarkAvx2SingleWriter/64KB-32 13682 87853 ns/op 745.98 MB/s 112 B/op 1 allocs/op BenchmarkAvx2SingleWriter/128KB-32 7058 175829 ns/op 745.45 MB/s 112 B/op 1 allocs/op BenchmarkAvx2SingleWriter/256KB-32 3428 346558 ns/op 756.42 MB/s 112 B/op 1 allocs/op BenchmarkAvx2SingleWriter/512KB-32 1713 686515 ns/op 763.69 MB/s 112 B/op 1 allocs/op BenchmarkAvx2SingleWriter/1MB-32 874 1366132 ns/op 767.55 MB/s 112 B/op 1 allocs/op BenchmarkAvx2SingleWriter/2MB-32 439 2740318 ns/op 765.30 MB/s 112 B/op 1 allocs/op BenchmarkAvx2SingleWriter/4MB-32 220 5431817 ns/op 772.17 MB/s 113 B/op 1 allocs/op BenchmarkAvx2SingleWriter/8MB-32 100 10840002 ns/op 773.86 MB/s 116 B/op 1 allocs/op ``` Compare to pure crypto/md5: ``` BenchmarkCryptoMd5/32KB-32 30612 39004 ns/op 840.11 MB/s 0 B/op 0 allocs/op BenchmarkCryptoMd5/64KB-32 15285 77985 ns/op 840.37 MB/s 0 B/op 0 allocs/op BenchmarkCryptoMd5/128KB-32 7498 156175 ns/op 839.26 MB/s 0 B/op 0 allocs/op BenchmarkCryptoMd5/256KB-32 3870 310336 ns/op 844.71 MB/s 0 B/op 0 allocs/op BenchmarkCryptoMd5/512KB-32 1874 623266 ns/op 841.19 MB/s 0 B/op 0 allocs/op BenchmarkCryptoMd5/1MB-32 960 1243750 ns/op 843.08 MB/s 0 B/op 0 allocs/op BenchmarkCryptoMd5/2MB-32 480 2489588 ns/op 842.37 MB/s 0 B/op 0 allocs/op ``` After optimizing the assembly: ``` BenchmarkAvx2SingleWriter BenchmarkAvx2SingleWriter/32KB-32 28570 41941 ns/op 781.29 MB/s 0 B/op 0 allocs/op BenchmarkAvx2SingleWriter/64KB-32 14388 83055 ns/op 789.06 MB/s 0 B/op 0 allocs/op BenchmarkAvx2SingleWriter/128KB-32 7500 167734 ns/op 781.43 MB/s 0 B/op 0 allocs/op BenchmarkAvx2SingleWriter/256KB-32 3636 332508 ns/op 788.38 MB/s 1 B/op 0 allocs/op BenchmarkAvx2SingleWriter/512KB-32 1818 659667 ns/op 794.78 MB/s 2 B/op 0 allocs/op BenchmarkAvx2SingleWriter/1MB-32 915 1315847 ns/op 796.88 MB/s 5 B/op 0 allocs/op BenchmarkAvx2SingleWriter/2MB-32 457 2621787 ns/op 799.89 MB/s 11 B/op 0 allocs/op BenchmarkAvx2SingleWriter/4MB-32 229 5213972 ns/op 804.44 MB/s 22 B/op 0 allocs/op BenchmarkAvx2SingleWriter/8MB-32 100 10409999 ns/op 805.82 MB/s 51 B/op 0 allocs/op ```
v1.1.0
Minor release
Fix version number to work with modules
First stable release
First stable release. API is finalized now.
Initial release
Initial release of md5-simd