-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kwxm/uplc-criterion #5755
Kwxm/uplc-criterion #5755
Conversation
@@ -268,64 +292,72 @@ runApplyToData (ApplyOptions inputfiles ifmt outp ofmt mode) = | |||
Right (d :: Data) -> | |||
pure $ UPLC.Program () ver $ mkConstant () d | |||
|
|||
---------------- Benchmarking ---------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if all of the !
/$!
/force
stuff is really necessary (maybe we don't need !
on term
and anonTerm
for example), but at least some of it is and anything extra probably doesn't do any harm.
/benchmark validation |
^ Just to check if it's working. |
Click here to check the status of your benchmark. |
Comparing benchmark results of 'validation' on '8d66cbe71' (base) and '0473b8d09' (PR) Results table
|
That is not how it supposed to be... Let me dive into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing weird jumped on me, if we're certain that running the benchmarks via the executable is faster than via plutus-benchmark
, then it definitely should be investigated.
-- Big names slow things down | ||
!anonTerm = UPLC.termMapNames (\(PLC.NamedDeBruijn _ i) -> PLC.NamedDeBruijn "" i) term | ||
-- Big annotations slow things down | ||
!unitAnnTerm = force (() <$ anonTerm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you remove () <$
? Maybe it's the force
that matters here rather than the annotations? Force
sure is important there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you remove
() <$
? Maybe it's theforce
that matters here rather than the annotations?Force
sure is important there.
The annotations definitely make a difference. I was seeing a discrepancy of 2-3% between flat files and textual files and after several hours I realised that they had different types of annotation: making all the annotations ()
removed the discrepancy.
runBenchmark (BenchmarkOptions inp ifmt semvar timeLim) = do | ||
prog <- readProgram ifmt inp | ||
let criterionConfig = defaultConfig {reportFile = Nothing, timeLimit = timeLim} | ||
cekparams = mkMachineParameters semvar PLC.defaultCekCostModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use PlutusCore.Evaluation.Machine.MachineParameters.Default.mkMachineParametersFor
here instead? You'll need to apply it to from PlutusCore.Evaluation.Machine.ExBudgetingDefaults.defaultCostModelParams
I think. Or something like that, the point is that we only track performance of builtins when mkMachineParametersFor
is used and mkMachineParameters
can be slower significantly (yes, even though the former uses the latter).
!anonTerm = UPLC.termMapNames (\(PLC.NamedDeBruijn _ i) -> PLC.NamedDeBruijn "" i) term | ||
-- Big annotations slow things down | ||
!unitAnnTerm = force (() <$ anonTerm) | ||
benchmarkWith criterionConfig $! whnf evaluate unitAnnTerm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to reuse Common.benchWith
from the benchmarks to make sure it's not how we run the benchmarks that makes the difference?
Have you tried comparing benchmarking results of the executable run against results of another executable run? Maybe some |
Yes, I ran it dozens of times and used |
^ Here are two runs on the
|
Here's what happens if I replace
|
I accidentally pushed some stuff that should be in another branch. Let me try to fix that. ... OK, I think that worked. |
This reverts commit a1d7760.
@effectfully My intention here was just to provide a quick way of doing some benchmarking when I'm experimenting, so I wasn't too worried about the differences from the "real" benchmarking results. However, I do have a branch here where I copied in lots of stuff from
Just to be clear about how these relate to the previous results; I re-ran all of the validation benchmarks in
|
@effectfully I'm going to merge this. Please make a ticket if you want to investigate further. I'd also like to pin down exactly what's causing our standard benchmarks to take longer! |
The
uplc
executable had a-x
option that let you measure the execution time of a script. That didn't give good results, so here I've replaced it with abenchmark
command that runs a Criterion benchmark instead. This gives much better results, but they're not directly comparable with the validation benchmarks inplutus-benchmark
: see the attached comparison. In general the benchmarks here are maybe 15-20% faster than the ones inplutus-benchmark
(but seemultisig-sm
2, 5, 6, and 10!). I think the difference is probably due to all the extra stuff that evaluateCekLikeInProd has to do: I pasted all of the extra code intouplc
and the results mostly seemed to at least be in the same ballpark: I'll try that again with the final version. There also seems to be some inverse correlation between the difference of the two sets of results and the size of the script (ie, the bigger the script, the smaller the difference between theuplc
results and theplutus-benchmark
ones).