Skip to content

block/br return value optimizations #667

@kripken

Description

@kripken

There was some discussion about block and br return values, and in particular how much benefit we can get from them, as both our current compiler options (asm2wasm and wasm-backend) don't use them or barely use them (asm2wasm uses block return values in rare cases, for asm.js sequences/commas, but never uses br values). I investigated this topic in the binaryen optimizer (block-returns branch; note code is not cleaned up yet).

My approach is to process control flow and see where it joins up, at the end of an if-else or a block, and then see if there is a set_local that flows through all those arriving branches (this would occur e.g. if an optimizing compiler emitted a phi node there). If so, we can use block and br return values to pass the result, instead of set_locals, and put a single set_local on the outside of the block or if. In other words, it unifies the set_locals, like this:

(block $out
  (if
    (..)
    (block
      (set_local $x (i32.const 10))
      (br $out)
    )
  )
  ..
  (set_local $x (i32.const 20))
)

==>

(set_local $x
  (block $out
    (if
      (..)
      (br $out (i32.const 10))
    )
    ..
    (i32.const 20)
  )
)

This removes set_locals, and can then allow further optimizations, e.g. the new set_local is now on the outside where it might be sunk into a later get_local, etc., and other stuff.

On BananaBread, this removes around 4% of set_locals. (This was better than I expected, so I ran it through a bunch of tests to verify it's working properly.)

Note that this is an underestimate of the opportunity here, as

  1. It operates on the non-SSA form binaryen AST. Probably in SSA form more could be done (operating directly on phis).
  2. It doesn't track variables through basic blocks, each is only considered in its own basic block.
  3. This removes set_locals and get_locals, which might then allow reducing the total number of locals. But we can't test that since the binaryen optimizer doesn't have a register allocator (yet).

Overall, then, it seems like block/br return value optimizations can be pretty useful.

I also did some measurements for the topic of multiple return values. I can't optimize them in the current AST, so I don't have full and tested data. But I counted how many locations have a set_local that can be optimized, and how many there are at each - when there is more than one, multiple return values might be employed. Looks like 15.4% of potential locations have at least one set_local that can be optimized, and they have an average of 1.12 per location. In other words, it's common to have a single optimizable set_local, and while sometimes there is more than one, that is rare.

All that has the caveats 1-3 from before, and in addition that as mentioned it's just data and not a full and tested optimization pass, so it would be good to see more investigation here. But this data suggests that a single return value from ifs, blocks, and brs is enough for almost all the benefit here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions