Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When and why should we use one register or two registers in ..._arm64.s? #41

Open
SparrowLii opened this issue Aug 14, 2020 · 1 comment

Comments

@SparrowLii
Copy link

SparrowLii commented Aug 14, 2020

In crypto/md5/md5block_arm64.s there are two similar rounds:
#define ROUND1(a, b, c, d, index, const, shift)
ADDW $const, a;
ADDW R8, a;
MOVW (index*4)(R1), R8;
EORW c, R9;
ANDW b, R9;
EORW d, R9;
ADDW R9, a;
RORW $(32-shift), a;
MOVW c, R9;
ADDW b, a

#define ROUND2(a, b, c, d, index, const, shift)
ADDW $const, a;
ADDW R8, a;
MOVW (index*4)(R1), R8;
ANDW b, R10;
BICW R9, c, R9;
ORRW R9, R10;
MOVW c, R9;
ADDW R10, a;
MOVW c, R10;
RORW $(32-shift), a;
ADDW b, a

The go code of ROUND1 is: a = b + bits.RotateLeft32((((c^d)&b)^d)+a+x0+const, shift)
ps: (c^d)&b)^d) is equal (b&c) | ((^b)&d).
The go code of ROUND2 is: a = b + bits.RotateLeft32((((b^c)&d)^c)+a+x0+const, shift)
ps: (b^c)&d)^c) is equal (b&d) | (c&(^d)).
Why it uses one register in ROUND1(R9) but uses two registers(R9, R10) in ROUND2, and they are both fastest?

@surechen
Copy link
Collaborator

round1 :1) (c^d)&b)^d) 2) (b&c) | ((^b)&d)
round2 :3) (b^c)&d)^c) 4) (b&d) | (c&(^d))

看汇编代码,round1是按照公式 1)执行的,round2是按照公式4)执行的,按照公式1)和3)编写代码一个寄存器就够了,按照2)和4)编写代码在符号‘|’两边是两个公式,正常应该使用两个寄存器,我们都知道将栈上数据取到寄存器也是有一定开销的,但是使用更多寄存器可能意味着更好的利用流水线和运算部件,以a76处理器为例,arm a76处理器包含三个整数相关的运算部件,对于整数的and、orr等操作这三个运算部件都可以使用,每个执行需要1时钟周期,吞吐量是3,意味着在一个时钟周期内是可以三个指令并发执行的。
我猜测你是把公式 1)2)都用到round1,把公式3)4)都用到了round2,并发现原有实现是最快的,事实上看起来1)和3),2)和4)都是等价的,我建议1)因为其他指令的顺序可能会影响流水线的并发度,你把指令顺序调整下,让他们在round1和round2中相对位置一致,再测试。2)把这部分代码单独拎出来,通过benchmark来比较性能,跑benchmark时带上-count,多跑几次,增加准确性,看看是否可以排除这段代码的影响。等到把准确数据贴出来,我们再继续讨论,你可以向我索要这本a76相关的手册,虽然可能与鲲鹏处理器并不完全一致。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants