cmd/compile: optimize convI2I, assertI2I, assertE2I for statically known types #51133
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Performance
Milestone
A conversion that involves
runtime.getitab
(convI2I
,assertI2I
,assertE2I
) is generally much slower than an ordinaryinterface value construction from a statically known concrete type.
The Go compiler has an IR-based devirtualization pass that performs a local method call devirtualization. It's not as silly as it might sound as it works well enough in some situations thanks to the inlining.
It does not, however, handle I2I-like operations. Even if we know that converted value has some non-interface type
T
, we still do an expensive I2I operation.This situation occurs a lot in a codebase that works with hierarchical data. AST is an example: we have
ast.Node
andast.Expr
. It's quite common to write a function that acceptsast.Node
while some other function can operate withast.Expr
. In the*ast.Ident
->ast.Expr
->ast.Node
chain we can simplify theast.Expr
->ast.Node
conversion if we use the information thatast.Expr
is actually*ast.Ident
.A real-world case can be found in the Go compiler code.
Another situation is when constructor returns an interface type, like
hash.Hash
and then they're passed asio.Writer
. In many cases we can avoidconvI2I
formd5.New()
->io.Writer
case (it works for most hash/crypto related constructors).Here is a simple benchmark that illustrates the performance problem with I2I:
If we rewrite
convI2I
like this (pseudo-code):Then we get these results:
The same idea applies to the type assertions that involve interface-to-interface conversion.
The optimized code is also usually smaller from the machine code point of view.
.text
segment size differences:Total binary size differences:
Note: this does not solve all convI2I issues, but it can at least reduce the amount of convI2I we see in our CPU profiles.
I'll send a CL that provides my first attempt at this optimization. If CL is not good enough, we can at least have this issue that has some sweet numbers to think about.
Implementation notes
Changing
devirtualize.go
fromir.VisitList
toir.EditChildren
makes it measurably slower.This is why a slightly less simple approach is used when we keep
ir.VisitList
, but handle some nodes via their parents. It covers less code, but in practice the optimization coverage should be OK. Suggestions are welcome: it could be the case that we can introduce these optimizations to some other part of the compiler.This approach runs with almost identical speed, compilebench shows no significant diff this time:
The text was updated successfully, but these errors were encountered: