cmd/compile: reorganize associative computation to allow superscalar execution

The compiler currently compiles `a+b+c+d` as `a+(b+(c+d))`. It should use `(a+b)+(c+d)` instead, because the latter can be executed out of order.

More broadly, we should balance trees of associative computation.

Doing this in the compiler with a rule tailored for a single computation type in round 4 of md5block.go yielded a 15% throughput improvement. (See below for details.)

It's not obvious to me whether we can do this with carefully crafted rewrite rules or whether a dedicated pass would be better. But it looks like there may be significant performance wins available on tight mathematical code.

I don't plan to work on this further, but I really hope someone else picks it up.

cc @randall77 @martisch @FiloSottile @mmcloughlin @mdempsky 

---

To reproduce the md5 results, disable the optimized assembly routines, and add this rewrite rule:

```
(Add32 <t> c:(Const32) (Add32 d:(Load _ _) (Add32 x:(Xor32 _ _) a:(Add32 _ _)))) => (Add32 (Add32 <t> c d) (Add32 <t> x a))
```

and disable this one to avoid an infinite loop:

```
(Add32 (Add32 i:(Const32 <t>) z) x) && (z.Op != OpConst32 && x.Op != OpConst32) => (Add32 i (Add32 <t> z x))
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: reorganize associative computation to allow superscalar execution #49331

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cmd/compile: reorganize associative computation to allow superscalar execution #49331

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions