forked from emscripten-core/emscripten
-
Notifications
You must be signed in to change notification settings - Fork 0
LLVM Backend
kripken edited this page Jul 13, 2012
·
13 revisions
The original emscripten compiler was written in JavaScript, which was very useful for quickly prototyping new ideas during development of the various new methods needed for effective compilation to JavaScript (the relooper, longjmp tricks, C++ exceptions in JS, etc.). It is also quite stable at this point and generates very good code. However, it has a few downsides:
- Compiler speed. The generated code is fast, but generating the code is not so fast. Especially with full optimizations on, builds can be quite slow. This is not an issue for tens of thousands of lines of code, and is annoying but not horrible for hundreds of thousands, but it a serious problems for millions.
- LLVM backends integrate more closely with LLVM, and can leverage LLVM's internal code analysis and optimization. The original compiler just parses LLVM bitcode externally, so it cannot benefit from internal capabilities of LLVM.
- An upstream LLVM backend is easier to use for people than a separate project. Compiling to JS should, as much as possible, be just another backend in a compiler.
The plan is to start work over Summer 2012.
Guidelines and issues:
- We will use the C++ Relooper implementation https://github.com/kripken/Relooper
- Focus on the C-style memory layout method. Other approaches (no typed arrays, unaliasing typed arrays) will only be done by the original compiler.
- When possible, do native JS function calls
f(x,y,z)
and not read/writes from the C stack. Tricky with varargs but perhaps possible even there with internal LLVM changes. - Far better to do
x = (a+b)/z
instead oft = a+b ; x = t/z
, unclear how easy it is to do that in an LLVM backend. - More advanced C++ static analysis than the current compiler should allow removal of a lot of unnecessary address shifting
- See https://bugzilla.mozilla.org/show_bug.cgi?id=771106 for some optimizations we should implement. Also https://bugzilla.mozilla.org/show_bug.cgi?id=771285#c5
- To get started we will not create an object format for JavaScript, we can continue to use the emcc wrapper which uses clang in a way that utilizes LLVM bitcode as the intermediate object format. So the initial goal is just to generate JS in the backend directly, that is, from LLVM IR in memory.
- Some initial work by Ehsan on Emscripten support in LLVM and clang are in
- https://github.com/ehsan/llvm/commit/ad4c8c52f68a1694cbb66fe861f325928ca04d7c
- https://github.com/ehsan/clang/commit/3a8eff2f5646605d949222032422a12967b34790
- LLVM already has a target triple ArchType of le32 with comment
generic little-endian 32-bit CPU (PNaCl / Emscripten)
, we should presumably use that? - Of the existing backends, the simplest is CppBackend, but it might be too simple. Sparc seems to be the smallest "real" backend.
- Should we call this+Emscripten Emscripten 2.0?
- Should we call the LLVM backend itself "JS" or "Emscripten" internally in LLVM?
Setup
- We track LLVM and Clang svn through their git mirror http://llvm.org/docs/GettingStarted.html#git_mirror
- Our repos are
- https://github.com/kripken/llvm-js
- https://github.com/kripken/clang-js
- Updating from svn: Pull from the git mirrors with
--rebase
(as also recommended on the LLVM link above). Then push to our github repos.