[MC] Compiler performance regression in Clang 19 with -mbranches-within-32B-boundaries

I'm building the same code with clang 18 and 19, and noticed that some target build times are disproportionately affected by switching to new compiler - in general Clang 19 is 5-10% slower but an LTO build of one particular target slowed down x2.5

Tried `--time-trace` but don't know what to make of it other than that OptModule got some long tails in Clang 19. First worker under main thread is building the same module in both images so can be directly compared - OptModule time increased from 1m20s to 5m24s, x4
![image](https://github.com/user-attachments/assets/d9b5185c-4e50-4a9d-9529-c837f96d6234)
```
913.621213 Total OptModule
856.716409 Total OptFunction
856.192565 Total RunPass
556.340514 Total PassManager<Function>
512.635569 Total ModuleInlinerWrapperPass
510.885891 Total ModuleToPostOrderCGSCCPassAdaptor
509.09462 Total DevirtSCCRepeatedPass
507.621024 Total PassManager<LazyCallGraph::SCC, CGSCCAnalysisManager, LazyCallGraph &, CGSCCUpdateResult &>
434.932548 Total CGSCCToFunctionPassAdaptor
142.495075 Total ExecuteLinker
142.421367 Total Link
141.506523 Total LTO
132.923099 Total InstCombinePass
124.003487 Total ModuleToFunctionPassAdaptor
```
![image](https://github.com/user-attachments/assets/de5a6cfa-02a7-41dc-9f5f-6704c854ca93)
```
3237.53794 Total OptModule
845.04484 Total OptFunction
844.38391 Total RunPass
552.922664 Total PassManager<Function>
497.867448 Total ModuleInlinerWrapperPass
495.840083 Total ModuleToPostOrderCGSCCPassAdaptor
493.816647 Total DevirtSCCRepeatedPass
492.195245 Total PassManager<LazyCallGraph::SCC, CGSCCAnalysisManager, LazyCallGraph &, CGSCCUpdateResult &>
417.747014 Total CGSCCToFunctionPassAdaptor
385.505297 Total ExecuteLinker
385.437975 Total Link
384.301031 Total LTO
141.092082 Total InstCombinePass
137.907089 Total ModuleToFunctionPassAdaptor
```

`perf` trace and manual breaking in gdb show that a lot of time is spent around
```
llvm::MCAssembler::layout() ()
llvm::MCObjectStreamer::finishImpl() ()
llvm::MCELFStreamer::finishImpl() ()
llvm::AsmPrinter::doFinalization(llvm::Module&) ()
llvm::FPPassManager::doFinalization(llvm::Module&) ()
llvm::legacy::PassManagerImpl::run(llvm::Module&) ()
```
and also `llvm::MCExpr::evaluateAsRelocatableImpl`. My current build is stripped though, I'll return back with trace results with debug symbols later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MC] Compiler performance regression in Clang 19 with -mbranches-within-32B-boundaries #107754

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[MC] Compiler performance regression in Clang 19 with -mbranches-within-32B-boundaries #107754

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions