Open
Description
Here's a profile of a simplified benchmark case of NoFib's bernoulli
after #8 has been fixed:
COST CENTRE MODULE SRC %time %alloc
lookupEnvSO Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:(631,1)-(649,21) 6.1 3.4
evalStackContinuation.\ Stg.Interpreter lib/Stg/Interpreter.hs:(355,74)-(394,35) 5.6 9.1
builtinStgEval Stg.Interpreter lib/Stg/Interpreter.hs:(154,1)-(201,103) 5.1 4.5
evalExpr.\ Stg.Interpreter lib/Stg/Interpreter.hs:(497,45)-(502,23) 4.8 5.3
evalExpr Stg.Interpreter lib/Stg/Interpreter.hs:(423,1)-(533,93) 3.9 1.0
compare Stg.Syntax lib/Stg/Syntax.hs:(30,3)-(32,12) 3.8 0.0
evalExpr.\ Stg.Interpreter lib/Stg/Interpreter.hs:(504,37)-(510,27) 3.0 1.9
addInterClosureCallGraphEdge.addEdge Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:820:7-127 2.5 0.8
setInsert Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:(793,1)-(795,36) 2.5 0.0
decodeStgbin' Stg.IO lib/Stg/IO.hs:52:1-22 2.5 4.6
readHeap Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:(655,1)-(660,71) 2.2 0.9
addIntraClosureCallGraphEdge.addEdge Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:831:7-127 2.1 0.8
lookupEnv Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:652:1-53 2.0 1.6
addBinderToEnv Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:621:1-49 2.0 1.6
lookup# Data.HashMap.Base Data/HashMap/Base.hs:509:1-80 1.9 0.5
compare Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:1224:17-19 1.5 0.0
matchFirstLit Stg.Interpreter lib/Stg/Interpreter.hs:(537,1)-(544,112) 1.5 3.0
== Stg.Syntax lib/Stg/Syntax.hs:75:13-14 1.4 0.0
stackPop.\ Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:560:57-166 1.3 0.7
evalStackMachine.\ Stg.Interpreter lib/Stg/Interpreter.hs:339:24-82 1.3 2.5
setProgramPoint Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:841:1-80 1.3 9.8
stackPop Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:(558,1)-(563,19) 1.2 4.9
builtinStgApply Stg.Interpreter lib/Stg/Interpreter.hs:(204,1)-(237,69) 1.1 1.2
addZippedBindersToEnv.\ Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:624:60-86 1.1 1.2
matchFirstCon Stg.Interpreter lib/Stg/Interpreter.hs:(564,1)-(569,31) 1.1 1.9
tryNextDebugCommand Stg.Interpreter.Debugger lib/Stg/Interpreter/Debugger.hs:(28,1)-(34,12) 1.0 0.4
store Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:(579,1)-(589,106) 0.9 2.6
store.\ Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:580:32-70 0.7 1.7
freshHeapAddress Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:(568,1)-(570,87) 0.7 2.4
declareBinding.\ Stg.Interpreter lib/Stg/Interpreter.hs:(579,22)-(584,58) 0.6 1.0
stackPush Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:(553,1)-(555,96) 0.5 4.6
>>=.\.\ Data.Conduit.Internal.Conduit src/Data/Conduit/Internal/Conduit.hs:152:51-68 0.5 4.0
store.\ Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:589:38-106 0.5 1.7
addIntraClosureCallGraphEdge Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:(830,1)-(838,5) 0.3 1.3
addInterClosureCallGraphEdge Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:(819,1)-(827,5) 0.3 1.3
freshHeapAddress.\ Stg.Interpreter.Base lib/Stg/Interpreter/Base.hs:570:30-87 0.2 2.1
Most of the functions there are related to stack or heap manipulation. Looking at the code and the fact that setProgramPoint
(which does only one thing: modify the StgState
's ssCurrentProgramPoint
) contributes almost 10% of all allocations, I think the lovely simple design of a single StgState
which contains the whole interpreter state in a huge immutable record might be the next bottleneck.
Unfortunately, we don't have mutable fields (yet) in GHC Haskell. So here are other suggestions:
- Make all fields of
StgState
STVar
s orMVar
s. Probably the most performant option - Segregate
StgState
into two (or more) recordsStgStateHot
/StgStateCold
. Put hot stuff likessCurrentProgramPoint
inStgStateHot
. Bonus points for a record pattern synonym that keeps the old interface (but then call sites must be absolutely sure to inline away the PS)
Metadata
Assignees
Labels
No labels