-
Notifications
You must be signed in to change notification settings - Fork 335
Closed
Description
For some reason, the array access to llTable on seqdec.go:283 profiles as being significantly slower on arm64 than the mlTable and ofTable accesses, taking 5-10x longer than any other similar access. Disassembly attached below (note that the time is misattributed to :285, but after reordering the code order, I was able to get the problem to show up in the llTable access). this suggests the processor reports llTable instruction has succeeded in the program counter, but then stalls before it can perform the :285 instructions.
260ms 320ms 540cd0: NOOP ;github.com/klauspost/compress/zstd.(*sequenceDecs).decode seqdec.go:281
. . 540cd4: MOVD 344(RSP), R8 ;bitreader.go:60
. . 540cd8: MOVD 32(R8), R9
110ms 110ms 540cdc: MOVBU 40(R8), R10 ;github.com/klauspost/compress/zstd.(*bitReader).getBitsFast bitreader.go:61
10ms 10ms 540ce0: ADD R10, R5, R11
. . 540ce4: MOVB R11, 40(R8) ;bitreader.go:61
10ms 10ms 540ce8: ADD R6, R4, R11 ;github.com/klauspost/compress/zstd.(*sequenceDecs).decode seqdec.go:282
. . 540cec: AND $31, R11, R11 ;seqdec.go:282
20ms 20ms 540cf0: AND $63, R10, R10 ;github.com/klauspost/compress/zstd.(*sequenceDecs).decode bitreader.go:60
. . 540cf4: LSL R10, R9, R9 ;bitreader.go:60
. . 540cf8: ORR $64, ZR, R10
30ms 30ms 540cfc: SUB R5, R10, R5 ;github.com/klauspost/compress/zstd.(*bitReader).getBitsFast bitreader.go:60
. . 540d00: AND $63, R5, R5 ;bitreader.go:60
. . 540d04: LSR R5, R9, R5
. . 540d08: MOVW R5, R5 ;bitreader.go:62
. . 540d0c: ASR R11, R5, R9 ;seqdec.go:282
60ms 60ms 540d10: ADD R3>>16, R9, R9 ;github.com/klauspost/compress/zstd.(*sequenceDecs).decode seqdec.go:283
. . 540d14: UBFIZ $3, R9, $9, R9 ;seqdec.go:283
. . 540d18: MOVD 256(RSP), R11
20ms 20ms 540d1c: MOVD (R11)(R9), R3 ;github.com/klauspost/compress/zstd.(*sequenceDecs).decode seqdec.go:283
1.10s 1.10s 540d20: AND $31, R6, R9 ;github.com/klauspost/compress/zstd.(*sequenceDecs).decode seqdec.go:285
. . 540d24: ASR R9, R5, R9 ;seqdec.go:285
. . 540d28: UBFIZ $1, R4, $4, R12 ;seqdec.go:286
Metadata
Metadata
Assignees
Labels
No labels