Closed
Description
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (go version
)?
go 1.7 and tip
What operating system and processor architecture are you using (go env
)?
darwin amd64
What did you do?
func setupState(state *[16]uint32, key *[32]byte, nonce []byte) {
state[0] = 0x61707865
state[1] = 0x3320646e
state[2] = 0x79622d32
state[3] = 0x6b206574
state[4] = binary.LittleEndian.Uint32(key[0:4])
state[5] = binary.LittleEndian.Uint32(key[4:8])
state[6] = binary.LittleEndian.Uint32(key[8:12])
state[7] = binary.LittleEndian.Uint32(key[12:16])
state[8] = binary.LittleEndian.Uint32(key[16:20])
state[9] = binary.LittleEndian.Uint32(key[20:24])
state[10] = binary.LittleEndian.Uint32(key[24:28])
state[11] = binary.LittleEndian.Uint32(key[28:32])
state[12] = 0
state[13] = binary.LittleEndian.Uint32(nonce[:4])
state[14] = binary.LittleEndian.Uint32(nonce[4:8])
state[15] = binary.LittleEndian.Uint32(nonce[8:12])
}
What did you expect to see?
I would expect those 32-bit loads to be collapsed into a single MOVL
instruction. This was implemented during the 1.7 cycle but apparently broke at some point.
What did you see instead?
"".setupState t=1 size=400 args=0x28 locals=0x0
0x0000 00000 (speed.go:9) TEXT "".setupState(SB), $0-40
0x0000 00000 (speed.go:9) FUNCDATA $0, gclocals·40ab8381cc1118f82a27deeb35bb6e5c(SB)
0x0000 00000 (speed.go:9) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x0000 00000 (speed.go:11) MOVQ $3684054920433006693, AX
0x000a 00010 (speed.go:11) MOVQ "".state+8(FP), CX
0x000f 00015 (speed.go:11) MOVQ AX, (CX)
0x0012 00018 (speed.go:13) MOVQ $7719281312240119090, AX
0x001c 00028 (speed.go:13) MOVQ AX, 8(CX)
0x0020 00032 (speed.go:15) MOVQ "".key+16(FP), AX
0x0025 00037 (speed.go:15) MOVBLZX 2(AX), DX
0x0029 00041 (speed.go:15) MOVWLZX (AX), BX
0x002c 00044 (speed.go:15) SHLL $16, DX
0x002f 00047 (speed.go:15) ORL BX, DX
0x0031 00049 (speed.go:15) MOVBLZX 3(AX), BX
0x0035 00053 (speed.go:15) SHLL $24, BX
0x0038 00056 (speed.go:15) ORL BX, DX
0x003a 00058 (speed.go:15) MOVL DX, 16(CX)
0x003d 00061 (speed.go:16) MOVWLZX 4(AX), DX
0x0041 00065 (speed.go:16) MOVBLZX 6(AX), BX
0x0045 00069 (speed.go:16) SHLL $16, BX
0x0048 00072 (speed.go:16) ORL DX, BX
0x004a 00074 (speed.go:16) MOVBLZX 7(AX), DX
0x004e 00078 (speed.go:16) SHLL $24, DX
0x0051 00081 (speed.go:16) ORL DX, BX
0x0053 00083 (speed.go:16) MOVL BX, 20(CX)
0x0056 00086 (speed.go:17) MOVWLZX 8(AX), DX
0x005a 00090 (speed.go:17) MOVBLZX 10(AX), BX
0x005e 00094 (speed.go:17) SHLL $16, BX
0x0061 00097 (speed.go:17) ORL DX, BX
0x0063 00099 (speed.go:17) MOVBLZX 11(AX), DX
0x0067 00103 (speed.go:17) SHLL $24, DX
0x006a 00106 (speed.go:17) ORL DX, BX
0x006c 00108 (speed.go:17) MOVL BX, 24(CX)
0x006f 00111 (speed.go:18) MOVWLZX 12(AX), DX
0x0073 00115 (speed.go:18) MOVBLZX 14(AX), BX
etc. that is, the Uint32()
calls are being generated with loads and shifts.