Skip to content

Conversation

@kwxm
Copy link
Contributor

@kwxm kwxm commented Feb 4, 2022

This adds a sizes-and-budgets option to the nofib-exe command which will print out the sizes and budget requirements of the standard nofib benchmarks. This might be useful for evaluating the end-to-end performance of the Plutus compiler, although we should really do SCP-2275. It take about 10-15 seconds to run on my laptop. The output looks like this:

$ cabal exec nofib-exe sizes-and-budgets
Script                     Size     CPU budget      Memory budget
-----------------------------------------------------------------
clausify/F1                5190     52200152258       174669548
clausify/F2                5190     64038012612       214125276
clausify/F3                5190    175970661101       588207506
clausify/F4                5190    276195627284       911723736
clausify/F5                5190   1127149641516      3771317880
knights/4x4                3669    123600843812       389897530
knights/6x6                3669    364411045942      1170799002
knights/8x8                3669    617528054149      1991766816
primes/05digits            2141     60166853948       124576525
primes/08digits            2141    110484348568       221530395
primes/10digits            2141    155008971902       305421029
primes/20digits            2141    311608230455       619618016
primes/30digits            2141    455094629786       904715821
primes/40digits            2141    610120624302      1214364547
primes/50digits            2141    598553596442      1164707493
queens4x4/bt               3127     19730440366        62692642
queens4x4/bm               3127     27486821791        86938518
queens4x4/bjbt1            3127     25057170139        80397052
queens4x4/bjbt2            3127     26893372938        86377662
queens4x4/fc               3127     69621054671       226697350
queens5x5/bt               3127    263250282408       828489180
queens5x5/bm               3127    316745014099       998447360
queens5x5/bjbt1            3127    315504023354      1002195392
queens5x5/bjbt2            3127    335731305476      1068332804
queens5x5/fc               3127    892876360204      2904782050

There's also a script called nofib-compare in plutus-benchmark which will compare the outputs of two runs. Here's a comparison of the results for this branch against the results for the UPLC simplifier branch:

$ ./plutus-benchmark/nofib-compare info1 info2
Script                     Size         CPU budget    Memory budget
-------------------------------------------------------------------
clausify/F1               -9.4%           -7.3%           -7.3%
clausify/F2               -9.4%           -7.5%           -7.5%
clausify/F3               -9.4%           -7.5%           -7.6%
clausify/F4               -9.4%           -9.8%          -10.0%
clausify/F5               -9.4%           -7.3%           -7.3%
knights/4x4              -10.1%          -14.8%          -15.8%
knights/6x6              -10.1%          -16.6%          -17.3%
knights/8x8              -10.1%          -17.1%          -17.8%
primes/05digits          -16.6%           -4.5%           -7.3%
primes/08digits          -16.6%           -4.0%           -6.7%
primes/10digits          -16.6%           -3.9%           -6.6%
primes/20digits          -16.6%           -3.7%           -6.2%
primes/30digits          -16.6%           -3.5%           -6.0%
primes/40digits          -16.6%           -3.5%           -6.0%
primes/50digits          -16.6%           -3.5%           -6.1%
queens4x4/bt             -13.4%          -12.5%          -13.2%
queens4x4/bm             -13.4%          -12.1%          -12.9%
queens4x4/bjbt1          -13.4%          -12.8%          -13.4%
queens4x4/bjbt2          -13.4%          -12.8%          -13.4%
queens4x4/fc             -13.4%          -13.1%          -13.5%
queens5x5/bt             -13.4%          -12.4%          -13.2%
queens5x5/bm             -13.4%          -12.1%          -12.9%
queens5x5/bjbt1          -13.4%          -12.6%          -13.3%
queens5x5/bjbt2          -13.4%          -12.6%          -13.3%
queens5x5/fc             -13.4%          -13.0%          -13.4%

It just shows you the changes because including the data from the input files makes the table extremely wide. I haven't made any attempt to automate this, but presumably we could do so if it's useful.

Pre-submit checklist:

  • Branch
    • Tests are provided (if possible)
    • Commit sequence broadly makes sense
    • Key commits have useful messages
    • Relevant tickets are mentioned in commit messages
    • Formatting, materialized Nix files, PNG optimization, etc. are updated
  • PR
    • (For external contributions) Corresponding issue exists and is linked in the description
    • Self-reviewed the diff
    • Useful pull request description
    • Reviewer requested

@kwxm kwxm requested review from bezirg and michaelpj February 4, 2022 00:36
@michaelpj
Copy link
Contributor

So I was thinking "why do we need two scripts for this, can't we get better output?" and it turns out that criterion can export results as CSV if you use the --csv flag. So maybe we should write a CSV-comparing script and use it for both...

Copy link
Contributor

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, except I do think it might be nice to start the glorious future of outputting CSV today. We depend on casssava elsewhere, it's pretty easy to use.

++ "You'll probably want to redirect the output to a file.")


-- Copied pretty much directly from plutus-tx/testlib/PlutusTx/Test.hs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh, need to centralize these :(

printSizesAndBudgets :: IO ()
printSizesAndBudgets = do
-- The applied programs to measure, which are the same as the ones in the benchmarks.
-- We can't put all of these in one list because the 'a's in 'CompiledCode a' are different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do it with an existential, but maybe not worth it.

@kwxm
Copy link
Contributor Author

kwxm commented Feb 4, 2022

So maybe we should write a CSV-comparing script and use it for both...

This script and the benchmarking one do different things though. In the benchmarking script you've only got one item of data per benchmark (the time) but here we've got three. I did consider producing three different table (size, cpu budget, memory budget) and then comparing them one by one, but that'd make it hard to see all the information about a single benchmark.

The CSV output from Criterion has lots if irrelevant stuff in it too. Here's an excerpt from the benchmark results for the builtins:

Name,Mean,MeanLB,MeanUB,Stddev,StddevLB,StddevUB
...
MultiplyInteger/ExMemory 29/ExMemory 11,1.7389750741689113e-6,1.7013633475641947e-6,1.7952985778693707e-6,1.466484816942319e-7,9.950858949034676e-8,2.1763453434231392e-7
MultiplyInteger/ExMemory 29/ExMemory 13,1.7100947810020254e-6,1.6574549873775534e-6,1.7807547206801181e-6,2.0495712031084943e-7,1.6080988529009743e-7,2.861485873603396e-7
MultiplyInteger/ExMemory 29/ExMemory 15,1.7272383735857425e-6,1.7244582331404776e-6,1.7323589279935075e-6,1.1724564396369767e-8,6.991366518903856e-9,2.0849524837797534e-8
MultiplyInteger/ExMemory 29/ExMemory 17,1.9333776451946716e-6,1.873866207628954e-6,2.0109144514208065e-6,2.2141608798988463e-7,1.7851280336270343e-7,2.8756898185597617e-7

I'm not convinced that we can process that uniformly with the output from this PR.

Also, maybe we want to process time figures from execution benchmarks differently to make them human-readable. The bench-compare script has some code to do that, but we don't want to do it for script sizes (maybe we could: 1.127T or 1127G would be a lot more readable than 1127149641516).

@michaelpj
Copy link
Contributor

Okay, I guess I won't be fussy about it. I just don't like proliferating these scripts too much and I wish we could simplify things somehow...

@kwxm
Copy link
Contributor Author

kwxm commented Feb 4, 2022

Seems to be stuck in CI.

@michaelpj
Copy link
Contributor

CI seems stuck.

@michaelpj
Copy link
Contributor

I think this is safe, though.

@michaelpj michaelpj merged commit 0ec4d6b into master Feb 4, 2022
MaximilianAlgehed pushed a commit to Quviq/plutus that referenced this pull request Mar 3, 2022
* Add command to nofib-exe to print size and budget info for each benchmark

* Update script

* Realign header

* Update comment

* Update comment

* Remove accidental imports

* updateMaterialized

* Some awk reformatting
@kwxm kwxm deleted the kwxm/nofib-size-info branch March 22, 2022 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants