Skip to content

StgState allocations dominate #9

Open
@sgraf812

Description

Here's a profile of a simplified benchmark case of NoFib's bernoulli after #8 has been fixed:

COST CENTRE                          MODULE                        SRC                                             %time %alloc

lookupEnvSO                          Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:(631,1)-(649,21)      6.1    3.4
evalStackContinuation.\              Stg.Interpreter               lib/Stg/Interpreter.hs:(355,74)-(394,35)          5.6    9.1
builtinStgEval                       Stg.Interpreter               lib/Stg/Interpreter.hs:(154,1)-(201,103)          5.1    4.5
evalExpr.\                           Stg.Interpreter               lib/Stg/Interpreter.hs:(497,45)-(502,23)          4.8    5.3
evalExpr                             Stg.Interpreter               lib/Stg/Interpreter.hs:(423,1)-(533,93)           3.9    1.0
compare                              Stg.Syntax                    lib/Stg/Syntax.hs:(30,3)-(32,12)                  3.8    0.0
evalExpr.\                           Stg.Interpreter               lib/Stg/Interpreter.hs:(504,37)-(510,27)          3.0    1.9
addInterClosureCallGraphEdge.addEdge Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:820:7-127             2.5    0.8
setInsert                            Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:(793,1)-(795,36)      2.5    0.0
decodeStgbin'                        Stg.IO                        lib/Stg/IO.hs:52:1-22                             2.5    4.6
readHeap                             Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:(655,1)-(660,71)      2.2    0.9
addIntraClosureCallGraphEdge.addEdge Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:831:7-127             2.1    0.8
lookupEnv                            Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:652:1-53              2.0    1.6
addBinderToEnv                       Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:621:1-49              2.0    1.6
lookup#                              Data.HashMap.Base             Data/HashMap/Base.hs:509:1-80                     1.9    0.5
compare                              Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:1224:17-19            1.5    0.0
matchFirstLit                        Stg.Interpreter               lib/Stg/Interpreter.hs:(537,1)-(544,112)          1.5    3.0
==                                   Stg.Syntax                    lib/Stg/Syntax.hs:75:13-14                        1.4    0.0
stackPop.\                           Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:560:57-166            1.3    0.7
evalStackMachine.\                   Stg.Interpreter               lib/Stg/Interpreter.hs:339:24-82                  1.3    2.5
setProgramPoint                      Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:841:1-80              1.3    9.8
stackPop                             Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:(558,1)-(563,19)      1.2    4.9
builtinStgApply                      Stg.Interpreter               lib/Stg/Interpreter.hs:(204,1)-(237,69)           1.1    1.2
addZippedBindersToEnv.\              Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:624:60-86             1.1    1.2
matchFirstCon                        Stg.Interpreter               lib/Stg/Interpreter.hs:(564,1)-(569,31)           1.1    1.9
tryNextDebugCommand                  Stg.Interpreter.Debugger      lib/Stg/Interpreter/Debugger.hs:(28,1)-(34,12)    1.0    0.4
store                                Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:(579,1)-(589,106)     0.9    2.6
store.\                              Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:580:32-70             0.7    1.7
freshHeapAddress                     Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:(568,1)-(570,87)      0.7    2.4
declareBinding.\                     Stg.Interpreter               lib/Stg/Interpreter.hs:(579,22)-(584,58)          0.6    1.0
stackPush                            Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:(553,1)-(555,96)      0.5    4.6
>>=.\.\                              Data.Conduit.Internal.Conduit src/Data/Conduit/Internal/Conduit.hs:152:51-68    0.5    4.0
store.\                              Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:589:38-106            0.5    1.7
addIntraClosureCallGraphEdge         Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:(830,1)-(838,5)       0.3    1.3
addInterClosureCallGraphEdge         Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:(819,1)-(827,5)       0.3    1.3
freshHeapAddress.\                   Stg.Interpreter.Base          lib/Stg/Interpreter/Base.hs:570:30-87             0.2    2.1

Most of the functions there are related to stack or heap manipulation. Looking at the code and the fact that setProgramPoint (which does only one thing: modify the StgState's ssCurrentProgramPoint) contributes almost 10% of all allocations, I think the lovely simple design of a single StgState which contains the whole interpreter state in a huge immutable record might be the next bottleneck.

Unfortunately, we don't have mutable fields (yet) in GHC Haskell. So here are other suggestions:

  • Make all fields of StgState STVars or MVars. Probably the most performant option
  • Segregate StgState into two (or more) records StgStateHot/StgStateCold. Put hot stuff like ssCurrentProgramPoint in StgStateHot. Bonus points for a record pattern synonym that keeps the old interface (but then call sites must be absolutely sure to inline away the PS)

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions