AVX512 accelerated version resulting in a 4x speed improvement over AVX2 #91

fwessels · 2019-02-08T21:16:48Z

No description provided.

klauspost

Impressive speed! No big problems, just some small tweaks.

I just came across this:

I guess it couldn't hurt to insert a dummy instruction when 'New' is called to warm up the registers. It will not be measurable in benchmarks since first run will do the warmup, but could affect real world use.

galoisAvx512_amd64.go

klauspost · 2019-02-09T11:15:44Z

galoisAvx512_amd64.go

+
+// Construct block of matrix coefficients for 2 outputs rows in parallel
+func setupMatrix82(matrixRows [][]byte, inputOffset, outputOffset int, matrix *[(16 + 16) * dimIn * dimOut82]byte) {
+


nit: Extra newline at beginning of functions.

galoisAvx512_amd64.go

README.md

klauspost · 2019-02-09T11:23:17Z

galoisAvx512_amd64.go

+	}
+	remain := len(in[0]) - done
+
+	if remain > 0 {


nit: let's just return, so we can kill some indentation.

galois_amd64.go

fwessels · 2019-02-10T00:37:24Z

Impressive speed! No big problems, just some small tweaks.

I just came across this:

I guess it couldn't hurt to insert a dummy instruction when 'New' is called to warm up the registers. It will not be measurable in benchmarks since first run will do the warmup, but could affect real world use.

Thanks for the comment. However the ~7th instruction or so starts loading the matrix coefficients into the ZMM registers so it already pretty quickly starts accessing them. I can move this instruction up but I doubt whether it would make much real world difference.

klauspost · 2019-02-10T10:15:52Z

@fwessels Thanks for the changes. I will merge them as is, since the remaining stuff is just sugar coating.

I was thinking of adding a dummy call in New around here - that would start the warmup before the matrices are calculated, which will probably take more cycles than is needed.

fwessels · 2019-02-11T04:25:01Z

I was thinking of adding a dummy call in New around here - that would start the warmup before the matrices are calculated, which will probably take more cycles than is needed.

Yes, as a dummy call around position that could help. And thanks for the merge.

…VX2 (klauspost#91) The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table: ``` $ benchcmp avx2.txt avx512.txt benchmark AVX2 MB/s AVX512 MB/s speedup BenchmarkEncode8x8x1M-72 1681.35 4125.64 2.45x BenchmarkEncode8x4x8M-72 1529.36 5507.97 3.60x BenchmarkEncode8x8x8M-72 791.16 2952.29 3.73x BenchmarkEncode8x8x32M-72 573.26 2168.61 3.78x BenchmarkEncode12x4x12M-72 1234.41 4912.37 3.98x BenchmarkEncode16x4x16M-72 1189.59 5138.01 4.32x BenchmarkEncode24x8x24M-72 690.68 2583.70 3.74x BenchmarkEncode24x8x48M-72 674.20 2643.31 3.92x ```

fwessels added 2 commits February 8, 2019 13:11

AVX512 accelerated version resulting in a 4x speed improvement over AVX2

c84de30

Fix -tags=noasm build error and format code

26e5c73

klauspost reviewed Feb 9, 2019

View reviewed changes

Minor fixes and improvements after code review.

e02678d

klauspost merged commit 79aee05 into klauspost:master Feb 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVX512 accelerated version resulting in a 4x speed improvement over AVX2 #91

AVX512 accelerated version resulting in a 4x speed improvement over AVX2 #91

fwessels commented Feb 8, 2019

klauspost left a comment

klauspost Feb 9, 2019

fwessels Feb 10, 2019

klauspost Feb 9, 2019

fwessels Feb 10, 2019

fwessels commented Feb 10, 2019

klauspost commented Feb 10, 2019

fwessels commented Feb 11, 2019


		// Construct block of matrix coefficients for 2 outputs rows in parallel
		func setupMatrix82(matrixRows [][]byte, inputOffset, outputOffset int, matrix [(16 + 16) dimIn * dimOut82]byte) {

AVX512 accelerated version resulting in a 4x speed improvement over AVX2 #91

AVX512 accelerated version resulting in a 4x speed improvement over AVX2 #91

Conversation

fwessels commented Feb 8, 2019

klauspost left a comment

Choose a reason for hiding this comment

klauspost Feb 9, 2019

Choose a reason for hiding this comment

fwessels Feb 10, 2019

Choose a reason for hiding this comment

klauspost Feb 9, 2019

Choose a reason for hiding this comment

fwessels Feb 10, 2019

Choose a reason for hiding this comment

fwessels commented Feb 10, 2019

klauspost commented Feb 10, 2019

fwessels commented Feb 11, 2019