Skip to content

Conversation

@sepp2k
Copy link

@sepp2k sepp2k commented Jun 12, 2023

Hi,

I saw your blog post on HN a while back and figured it would be a fun project to add a proper equivalence checker to your optimizer that can determine program equivalence in the presence of input. I ended up implementing two: a brute force one that just generates all possible inputs and runs the program on them; and an SMT-based one.

I also ended up refactoring a lot of things and changing the semantics of the language slightly (by making it use fixed-width integers) to enable these changes, so the PR ended up rather large. Sorry about that.

sepp2k added 9 commits June 3, 2023 20:48
For a state that's all zeros, the superoptimizer used to output
"LOAD 0" as the shortest program when in fact "LOAD 0" is a no-op
and the shortest program to get all zeros is the empty program.
Since programs are generated in ascending order of length, the first
program to be found that produces the right state will be the optimal
one. Thus there's no reason to keep searching once a program has been
found.

This improves performance quite a bit in some cases and removes the
need to guess the optimal length to provide a short maximum length.

Additionally, this removes the program cache as it would only help
when reusing results from previous searches, which is currently being
done and wouldn't be correct when using different options (like
max_value) anyway.
Also implement an equivalence checker that can determine program
equivalence in the presence of input (by trying all possible input
values).
The program representation now consists of classes with a `__str__`
method, so `assembler.output` is no longer needed.

More importantly, a program now contains the information how much
memory it needs, removing the need to manually set this parameter and
removing the possibility of accidentally using a wrong value.

The opcodes are now represented as strings rather than functions from
the CPU, so that the program representation is no longer tied to the
CPU. This removes the need to instantiate dummy CPU objects when you
don't actually want to run a program and more importantly makes this
representation more usable when working with the program independently
of the CPU.

The program representation is now defined in the instruction_set
module, which removes the dependency from the assembler module to the
CPU module. This made it possible to move the `run` helper method into
the CPU module (where it belongs) without introducing a cyclical
dependency).

The CPU, assembler and program generator have been updated
according to these changes.

The assembler is now also a bit more robust. It now allows leading
white space and produces errors on invalid lines, including
non-existant opcodes, wrong numbers of arguments or invalid memory
addresses. The arity checks and the tracking of the memory size
happens based on the definitions in instruction_set and when new
instructions are added there, the assembler should be able to parse
and check those new instructions without needing to be modified.

Similarly the code generator now works solely from the defintions in
`instruction_set` and should likewise work without modification when
new instructions are added.
This makes the instruction set more realistic as real CPUs don't
support arbitrary precision integers either.

More impoortantly, having a fixed width means there's only a finite
number of possible input, allowing the equivalence checker to actually
try all inputs representable using the current bit width rather than
unsoundly stopping at an arbitrary limit.

Even more importantly, this paves the way for the introduction of an
SMT solver as those don't tend to support XOR on unbounded integers,
only on bitvectors.
The SMT-based equivalence checker is significantly more expensive in
cases where there are no or few inputs. However, it easily outperforms
the brute force one in cases where there's a large bit width and/or
number of inputs.

Sadly, in those cases where the SMT solver would be better, the
superoptimizer will already be unbearably slow regardless of the
equivalence checker just from generating all the possible candidate
programs. So until the program generation is optimized, the SMT based
equivalence checker won't do much good for performance.
This improves the performance of the superoptimizer using the
SMT-based equivalence checker quite a bit, but it's still
noticably slower than the brute force one for cases with little
input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant