Description
The compiler usually splits the program into multiple codegen units, so LLVM can optimize them in parallel. The way it does this is by starting out with one codegen unit at per mod
and then repeatedly merging the two smallest CGUs until the desired number of CGUs has been reached.
However, the way we define "smallest" is very simplistic: We just take the number of translation items in a CGU as its "size", not taking into account the size of each translation item. So a CGU with two one-line functions is considered bigger than a CGU with one 1000-line function.
It should be possible to make this heuristic more accurate without incurring much complexity. One way to start would be to estimate the size of each translation item by looking at its MIR. Just counting the statements would be an improvement over the current situation.
More evenly sized CGUs can potentially improve compile times by preventing situations where all but one thread are sitting idle, waiting for a single oversized CGU to be optimized.