-
Notifications
You must be signed in to change notification settings - Fork 7
Implement equivalence checkers with input handling and other improvements #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sepp2k
wants to merge
9
commits into
AZHenley:main
Choose a base branch
from
sepp2k:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For a state that's all zeros, the superoptimizer used to output "LOAD 0" as the shortest program when in fact "LOAD 0" is a no-op and the shortest program to get all zeros is the empty program.
Since programs are generated in ascending order of length, the first program to be found that produces the right state will be the optimal one. Thus there's no reason to keep searching once a program has been found. This improves performance quite a bit in some cases and removes the need to guess the optimal length to provide a short maximum length. Additionally, this removes the program cache as it would only help when reusing results from previous searches, which is currently being done and wouldn't be correct when using different options (like max_value) anyway.
Also implement an equivalence checker that can determine program equivalence in the presence of input (by trying all possible input values).
The program representation now consists of classes with a `__str__` method, so `assembler.output` is no longer needed. More importantly, a program now contains the information how much memory it needs, removing the need to manually set this parameter and removing the possibility of accidentally using a wrong value. The opcodes are now represented as strings rather than functions from the CPU, so that the program representation is no longer tied to the CPU. This removes the need to instantiate dummy CPU objects when you don't actually want to run a program and more importantly makes this representation more usable when working with the program independently of the CPU. The program representation is now defined in the instruction_set module, which removes the dependency from the assembler module to the CPU module. This made it possible to move the `run` helper method into the CPU module (where it belongs) without introducing a cyclical dependency). The CPU, assembler and program generator have been updated according to these changes. The assembler is now also a bit more robust. It now allows leading white space and produces errors on invalid lines, including non-existant opcodes, wrong numbers of arguments or invalid memory addresses. The arity checks and the tracking of the memory size happens based on the definitions in instruction_set and when new instructions are added there, the assembler should be able to parse and check those new instructions without needing to be modified. Similarly the code generator now works solely from the defintions in `instruction_set` and should likewise work without modification when new instructions are added.
This makes the instruction set more realistic as real CPUs don't support arbitrary precision integers either. More impoortantly, having a fixed width means there's only a finite number of possible input, allowing the equivalence checker to actually try all inputs representable using the current bit width rather than unsoundly stopping at an arbitrary limit. Even more importantly, this paves the way for the introduction of an SMT solver as those don't tend to support XOR on unbounded integers, only on bitvectors.
The SMT-based equivalence checker is significantly more expensive in cases where there are no or few inputs. However, it easily outperforms the brute force one in cases where there's a large bit width and/or number of inputs. Sadly, in those cases where the SMT solver would be better, the superoptimizer will already be unbearably slow regardless of the equivalence checker just from generating all the possible candidate programs. So until the program generation is optimized, the SMT based equivalence checker won't do much good for performance.
This improves the performance of the superoptimizer using the SMT-based equivalence checker quite a bit, but it's still noticably slower than the brute force one for cases with little input.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
I saw your blog post on HN a while back and figured it would be a fun project to add a proper equivalence checker to your optimizer that can determine program equivalence in the presence of input. I ended up implementing two: a brute force one that just generates all possible inputs and runs the program on them; and an SMT-based one.
I also ended up refactoring a lot of things and changing the semantics of the language slightly (by making it use fixed-width integers) to enable these changes, so the PR ended up rather large. Sorry about that.