Description
I'd like to point out that I haven't been involved in the web assembly effort,
and I'm not maintaining any large or widely used compilers (just my own
toy-ish language, minor contributions to the QBE compiler backend, and an
internship on IBM's compiler team), but I ended up getting a bit ranty, and
was encouraged to share more widely.
So, while I'm a bit uncomfortable jumping in and suggesting major changes
to a project I haven't been working on... here goes:
My Complaints:
When I'm writing a compiler, the first thing that I'd do to with the high level
structure -- loops, if statements, and so on -- is validate them for semantics,
do type checking and so on. The second thing I do with them is just throw them
out, and flatten to basic blocks, and possibly to SSA form. In some other parts
of the compiler world, a popular format is continuation passing style. I'm not
an expert on compiling with continuation passing style, but it neither seems to
be a good fit for the loops and scoped blocks that web assembly seems to have
embraced.
I'd like to argue that a flatter, goto based format would be far more useful as
a target for compiler developers, and would not significantly hinder the
writing of a usable polyfill.
Personally, also I'm not a big fan of nested complex expressions. They're a bit
clunkier to consume, especially if inner nodes can have side effects, but I
don't strongly object to them as a compiler implementer -- The web assembly
JIT can consume them, I can ignore them and generate the instructions that map
to my IR. They don't make me want to flip tables.
The bigger problem comes down to loops, blocks, and other syntactic elements
that, as an optimizing compiler writer, you try very hard to represent as a
graph with branches representing edges; The explicit control flow constructs
are a hindrance. Reconstructing them from the graph once you've actually done
the optimizations you want is certainly possible, but it's quite a bit of
complexity to work around a more complex format. And that annoys me: Both the
producer and the consumer are working around entirely invented problems
which would be avoided by simply dropping complex control flow constructs
from web assembly.
In addition, the insistence of higher level constructs leads to some
pathological cases. For example, Duff's Device ends up with horrible web
assembly output, as seen by messing around in The Wasm Explorer.
However, the inverse is not true: Everything that can be expressed
in web assembler can be trivially converted to an equivalent in some
unstructured, goto based format.
So, at the very least, I'd like to suggest that the web assembly team add
support for arbitrary labels and gotos. If they choose to keep the higher
level constructs, it would be a bit of wasteful complexity, but at least
compiler writers like me wold be able to ignore them and generate output
directly.
Polyfilling:
One of the concerns I have heard when discussing this is that the loop
and block based structure allows for easier polyfilling of web assembly.
While this isn't entirely false, I think that a simple polyfill solution
for labels and gotos is possible. Whiie it might not be quite as optimal,
I think that it's worth a little bit of ugliness in the bytecode in order
to avoid starting a new tool with built in technical debt.
If we assume an LLVM (or QBE) like syntax for web assmembly, then some code
that looks like:
int f(int x) {
if (x == 42)
return 123;
else
return 666;
}
might compile to:
func @f(%x : i32) {
%1 = test %x 42
jmp %1 iftrue iffalse
L0:
%r =i 123
jmp LRet
L1:
%r =i 666
jmp LRet
Lret:
ret %r
}
This could be polyfilled to Javascript that looks like:
function f(x) {
var __label = L0;
var __ret;
while (__label != LRet) {
switch (__label) {
case L0:
var _v1 = (x == 42)
if (_v1) {__lablel = L1;} else {label = L2;}
break;
case L1:
__ret = 123
__label = LRet
break;
case L2;
__ret = 666
__label = LRet
break;
default:
assert(false);
break;
}
}
Is it ugly? Yeah. Does it matter? Hopefuly, if web assembly takes off,
not for long.
And if not:
Well, if I ever got around to targeting web assembly, I guess I'd generate code
using the approach I mentioned in the polyfill, and do my best to ignore all of
the high level constructs, hoping that the compilers would be smart enough to
catch on to this pattern.
But it would be nice if we didn't need to have both sides of the code generation
work around the format specified.