Skip to content

Pool more JIT resources to reduce memory usage/contention #44912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 12, 2022

Conversation

pchintalapudi
Copy link
Member

Rather than create a new TargetMachine/PassManager for every single compilation (which uses a lot of memory/construction+destruction time) or guarding a single one with a mutex (no parallelism), we can instead share PassManagers/TargetMachines between threads using a simple resource pool. This should hopefully reduce the latency impact in #44568 back to what it was before #44364.

Depends on #44605 for the resource pool implementation.

@pchintalapudi
Copy link
Member Author

Master:

Core.Compiler ──── 79.6007 seconds

Sysimage built. Summary:
Total ───────  91.153872 seconds 
Base: ───────  36.468653 seconds 40.0078%
Stdlibs: ────  54.682599 seconds 59.9893%

Precompilation complete. Summary:
Total ─────── 164.120550 seconds
Generation ── 125.911190 seconds 76.7187%
Execution ───  38.209360 seconds 23.2813%

Performance counter stats for 'make':

        646,078.45 msec task-clock                #    1.050 CPUs utilized          
        20,418,033      context-switches          #    0.032 M/sec                  
             1,187      cpu-migrations            #    0.002 K/sec                  
         5,761,609      page-faults               #    0.009 M/sec                  
 1,599,064,891,345      cycles                    #    2.475 GHz                      (83.35%)
    76,776,955,901      stalled-cycles-frontend   #    4.80% frontend cycles idle     (83.33%)
   260,942,150,532      stalled-cycles-backend    #   16.32% backend cycles idle      (83.33%)
 2,327,913,819,437      instructions              #    1.46  insn per cycle         
                                                  #    0.11  stalled cycles per insn  (83.30%)
   445,446,327,289      branches                  #  689.462 M/sec                    (83.33%)
    11,266,910,833      branch-misses             #    2.53% of all branches          (83.35%)

     615.025500678 seconds time elapsed

     604.227931000 seconds user
      42.061896000 seconds sys

PR:

Core.Compiler ──── 82.7436 seconds

Sysimage built. Summary:
Total ───────  88.887377 seconds 
Base: ───────  35.441637 seconds 39.8725%
Stdlibs: ────  53.443663 seconds 60.1251%

Precompilation complete. Summary:
Total ─────── 159.817340 seconds
Generation ── 120.957179 seconds 75.6846%
Execution ───  38.860161 seconds 24.3154%

 Performance counter stats for 'make':

        647,618.57 msec task-clock                #    1.051 CPUs utilized          
        20,797,353      context-switches          #    0.032 M/sec                  
             1,288      cpu-migrations            #    0.002 K/sec                  
         7,444,536      page-faults               #    0.011 M/sec                  
 1,602,508,947,119      cycles                    #    2.474 GHz                      (83.36%)
    79,713,692,314      stalled-cycles-frontend   #    4.97% frontend cycles idle     (83.34%)
   268,331,644,788      stalled-cycles-backend    #   16.74% backend cycles idle      (83.34%)
 2,308,256,966,847      instructions              #    1.44  insn per cycle         
                                                  #    0.12  stalled cycles per insn  (83.31%)
   439,783,835,134      branches                  #  679.078 M/sec                    (83.31%)
    11,017,362,932      branch-misses             #    2.51% of all branches          (83.34%)

     615.907288141 seconds time elapsed

     601.166668000 seconds user
      46.685558000 seconds sys

@pchintalapudi pchintalapudi requested a review from vtjnash April 8, 2022 17:28
@pchintalapudi pchintalapudi added compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM labels Apr 8, 2022
@vtjnash
Copy link
Member

vtjnash commented Apr 11, 2022

FYI: it is not very effective to ask for review while the PR does not yet apply to master

@pchintalapudi
Copy link
Member Author

Sorry about that, this PR should now be applicable to master directly.

OptimizerResultT operator()(orc::ThreadSafeModule TSM, orc::MaterializationResponsibility &R) {
TSM.withModuleDo([&](Module &M) {
uint64_t start_time = 0;
if (dump_llvm_opt_stream != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this might need some locking later (for dump_llvm_opt_stream)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be addressed in the latest commit in #44914 to lock around bundles of stream operations.

Comment on lines 923 to 925
{
(***PMs).run(M);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{
(***PMs).run(M);
}
(***PMs).run(M);

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@pchintalapudi pchintalapudi added the merge me PR is reviewed. Merge when all tests are passing label Apr 11, 2022
@DilumAluthge DilumAluthge merged commit c0c60e8 into master Apr 12, 2022
@DilumAluthge DilumAluthge deleted the pc/jit-pool branch April 12, 2022 01:59
@DilumAluthge DilumAluthge removed the merge me PR is reviewed. Merge when all tests are passing label Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants