Skip to content

Speed up linking by shrinking symbol and relocation tables #1222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 26, 2023

Conversation

stedolan
Copy link
Contributor

Linking currently takes a long time because many symbol references (in particular, to statically allocated data) are not resolved until link time, despite many of these references being to data defined in the same file.

So far, this patch removes about half the symbols in compiler-libs, mostly unexported data. (The benefit applies only to Closure builds at the moment).

The major change is to use the following type for symbols in Cmm onwards, instead of string:

type is_global = Global | Local

(* Symbols are marked with whether they are local or global,
   at both definition and use sites.
   Symbols defined as [Local] may only be referenced within the same file,
   and all such references must also be [Local].
   Symbols defined as [Global] may be referenced from other files.
   References from other files must be [Global], but references from the
   same file may be [Local].
   (Marking symbols in this way speeds up linking, as many references can
   then be resolved early) *)
type symbol =
  { sym_name : string;
    sym_global : is_global }

At assembly time, Local symbols are generated as assembler labels rather than ELF symbols, so references can be resolved eagerly. Global symbols are generated as both an ELF symbol and an assembler label, so that Local references within the same file can use the label form.

There is quite a lot of low-hanging fruit here. The last commit here removes another ~8% of symbol references by wrapping caml_call_gc once per file. We should also be able to turn references to functions in the same file into assembler label references, and to improve the encoding of debug info and frame tables to generate fewer relocations.

@Gbury
Copy link
Contributor

Gbury commented Mar 15, 2023

Is there a reason not to use the following type definition ?

type symbol =
  | Local of string
  | Global of string

@mshinwell mshinwell added the compilation speed Potential compilation speed improvements label Mar 16, 2023
@stedolan
Copy link
Contributor Author

Is there a reason not to use the following type definition ?

type symbol =
  | Local of string
  | Global of string

Not particularly? I don't have a very strong opinion here: the record is slightly closer to the truth (the symbol name is independent of its global status) but the variant has more lightweight syntax.

@stedolan stedolan marked this pull request as ready for review March 21, 2023 12:03
@Gbury
Copy link
Contributor

Gbury commented Mar 22, 2023

I'm reviewing this as well as working on a version with flambda2 that would try and make local every symbol non-reachable from the cmx.

Copy link
Contributor

@Gbury Gbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good from my point of review, but note that I did not review the cfg part, and that I skimmed over the the to_cmm part, since I rewrote most of it in #1246 which I expect to be merged very soon after this one.

Ideally, in the future, I'd very much like to do some more cleanup around symbols: this PR has small changes to make the local vs global symbols work, but this also means that there remains a lot of code handling symbols as strings. This makes it very easy to introduce bugs wrt locality (I have had to debug a few while writing #1246 ), so I think that after this one and #1246 , we should probably try and write another PR to use symbols instead of strings everywhere it makes sense (I'll probably do it). We could then consider all this work for upstreaming.

@mshinwell mshinwell linked an issue Mar 23, 2023 that may be closed by this pull request
@mshinwell
Copy link
Collaborator

I looked at @stedolan 's fixes for @gretay-js 's comments and they look fine. Merging.

@mshinwell mshinwell merged commit d1fa776 into ocaml-flambda:main Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compilation speed Potential compilation speed improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce symbol table sizes
4 participants