Skip to content

Conversation

@michaelpj
Copy link
Contributor

Don't look here yet, just benchmarking something.

@michaelpj
Copy link
Contributor Author

/benchmark plutus-benchmark:nofib

@michaelpj
Copy link
Contributor Author

The nofib benchmark is rebuilt every time, so it's a test of whether changes to the compiler make things faster. It would be nice to have size and budget outputs for them too, I'll think about adding those.

@iohk-devops
Copy link

Comparing benchmark results of '7a3e0a26d' (base) and 'b5e9d60e5' (PR)

Script 7a3e0a2 b5e9d60 Change
clausify/formula1 32.55 ms 30.91 ms -5.0%
clausify/formula2 39.95 ms 37.87 ms -5.2%
clausify/formula3 109.5 ms 103.7 ms -5.3%
clausify/formula4 182.0 ms 169.5 ms -6.9%
clausify/formula5 698.0 ms 664.8 ms -4.8%
knights/4x4 94.38 ms 84.90 ms -10.0%
knights/6x6 256.0 ms 235.0 ms -8.2%
knights/8x8 425.2 ms 393.7 ms -7.4%
primetest/05digits 60.05 ms 56.81 ms -5.4%
primetest/08digits 111.1 ms 105.5 ms -5.0%
primetest/10digits 157.1 ms 149.3 ms -5.0%
primetest/20digits 312.6 ms 297.5 ms -4.8%
primetest/30digits 455.0 ms 430.8 ms -5.3%
primetest/40digits 608.0 ms 576.3 ms -5.2%
primetest/50digits 602.7 ms 574.1 ms -4.7%
queens4x4/bt 14.83 ms 16.60 ms +11.9%
queens4x4/bm 20.81 ms 26.34 ms +26.6%
queens4x4/bjbt1 18.39 ms 21.18 ms +15.2%
queens4x4/bjbt2 19.68 ms 21.66 ms +10.1%
queens4x4/fc 46.95 ms 54.47 ms +16.0%
queens5x5/bt 200.1 ms 223.6 ms +11.7%
queens5x5/bm 241.0 ms 310.3 ms +28.8%
queens5x5/bjbt1 235.8 ms 270.4 ms +14.7%
queens5x5/bjbt2 248.6 ms 278.6 ms +12.1%
queens5x5/fc 598.4 ms 685.1 ms +14.5%

@michaelpj
Copy link
Contributor Author

/benchmark plutus-benchmark:nofib

@iohk-devops
Copy link

Comparing benchmark results of '7a3e0a26d' (base) and '9a6e0d2ff' (PR)

Script 7a3e0a2 9a6e0d2 Change
clausify/formula1 33.23 ms 30.69 ms -7.6%
clausify/formula2 40.60 ms 37.67 ms -7.2%
clausify/formula3 111.3 ms 103.4 ms -7.1%
clausify/formula4 185.0 ms 168.5 ms -8.9%
clausify/formula5 710.1 ms 660.2 ms -7.0%
knights/4x4 94.96 ms 84.27 ms -11.3%
knights/6x6 258.1 ms 234.4 ms -9.2%
knights/8x8 430.5 ms 391.3 ms -9.1%
primetest/05digits 59.62 ms 56.41 ms -5.4%
primetest/08digits 110.1 ms 104.7 ms -4.9%
primetest/10digits 154.8 ms 148.1 ms -4.3%
primetest/20digits 309.3 ms 296.5 ms -4.1%
primetest/30digits 449.0 ms 430.4 ms -4.1%
primetest/40digits 600.5 ms 578.4 ms -3.7%
primetest/50digits 596.5 ms 571.8 ms -4.1%
queens4x4/bt 14.84 ms 16.42 ms +10.6%
queens4x4/bm 20.87 ms 26.31 ms +26.1%
queens4x4/bjbt1 18.47 ms 20.79 ms +12.6%
queens4x4/bjbt2 19.74 ms 21.27 ms +7.8%
queens4x4/fc 47.44 ms 53.77 ms +13.3%
queens5x5/bt 199.7 ms 219.7 ms +10.0%
queens5x5/bm 241.6 ms 309.7 ms +28.2%
queens5x5/bjbt1 236.6 ms 266.4 ms +12.6%
queens5x5/bjbt2 249.4 ms 273.9 ms +9.8%
queens5x5/fc 604.4 ms 678.1 ms +12.2%

@michaelpj
Copy link
Contributor Author

Okay, this is looking very good in a lot of cases, including some of the size/budget tests we have. I think we should almost certainly do this.

TODO:

  • Better tests
  • Work out what's going on with the queens example in nofib
    • To do that I'm going to do another PR with size/budget tests for the nofib examples
  • Try and reduce the duplication somewhat if at all possible :(

@michaelpj
Copy link
Contributor Author

I regenerated the plutus-use-cases scripts before and after this, and it's a 6-15% size decrease, which seems good.

michaelpj added a commit that referenced this pull request Feb 2, 2022
This has bothered me for a while. We rely on this to turn function
applications into let-bindings so that we can inline them, but because
of the way it was written it could only ever do this for at most one
argument at a time! So we could only fully-inline a function of arity N
with N rounds of the simplifier! Terrible!

Now it handles any number at once.

I wrote all the functions for this while working on
#4365, since there we
*really* need it, since constructors and destructors appear like
multiple function arguments rather than multiple nested
immediately-applied-lambdas. But it applies well here too, so I thought
I'd just do this improvement on the side.
michaelpj added a commit that referenced this pull request Feb 2, 2022
This has bothered me for a while. We rely on this to turn function
applications into let-bindings so that we can inline them, but because
of the way it was written it could only ever do this for at most one
argument at a time! So we could only fully-inline a function of arity N
with N rounds of the simplifier! Terrible!

Now it handles any number at once.

I wrote all the functions for this while working on
#4365, since there we
*really* need it, since constructors and destructors appear like
multiple function arguments rather than multiple nested
immediately-applied-lambdas. But it applies well here too, so I thought
I'd just do this improvement on the side.
michaelpj added a commit that referenced this pull request Feb 2, 2022
This has bothered me for a while. We rely on this to turn function
applications into let-bindings so that we can inline them, but because
of the way it was written it could only ever do this for at most one
argument at a time! So we could only fully-inline a function of arity N
with N rounds of the simplifier! Terrible!

Now it handles any number at once.

I wrote all the functions for this while working on
#4365, since there we
*really* need it, since constructors and destructors appear like
multiple function arguments rather than multiple nested
immediately-applied-lambdas. But it applies well here too, so I thought
I'd just do this improvement on the side.
michaelpj added a commit that referenced this pull request Feb 2, 2022
This has bothered me for a while. We rely on this to turn function
applications into let-bindings so that we can inline them, but because
of the way it was written it could only ever do this for at most one
argument at a time! So we could only fully-inline a function of arity N
with N rounds of the simplifier! Terrible!

Now it handles any number at once.

I wrote all the functions for this while working on
#4365, since there we
*really* need it, since constructors and destructors appear like
multiple function arguments rather than multiple nested
immediately-applied-lambdas. But it applies well here too, so I thought
I'd just do this improvement on the side.
michaelpj added a commit that referenced this pull request Feb 3, 2022
This has bothered me for a while. We rely on this to turn function
applications into let-bindings so that we can inline them, but because
of the way it was written it could only ever do this for at most one
argument at a time! So we could only fully-inline a function of arity N
with N rounds of the simplifier! Terrible!

Now it handles any number at once.

I wrote all the functions for this while working on
#4365, since there we
*really* need it, since constructors and destructors appear like
multiple function arguments rather than multiple nested
immediately-applied-lambdas. But it applies well here too, so I thought
I'd just do this improvement on the side.
@michaelpj
Copy link
Contributor Author

Well that's extremely mysterious. The nofib examples use less budget but apparently run for longer 😱 My extremely sketchy guess is that we cut down on the proportion of the time spent doing machine steps, so more of the time is builtins, which have more prediction error? Unclear!

@michaelpj
Copy link
Contributor Author

The mystery deepens: the nofib tests that got faster got (slightly) bigger (which I don't understand at all), and the ones that got slower got smaller! More digging required.

@michaelpj
Copy link
Contributor Author

Looks like the size bump is due to inlining some constructors. In particular, nullary constructors look like const, basically, and we do inline that (we did it here, I noted that it was a bit suspicious at the time!). I think we need some arity analysis to make this not suck. For now I'll just turn it off with a note.

@kwxm
Copy link
Contributor

kwxm commented Feb 3, 2022

Well that's extremely mysterious. The nofib examples use less budget but apparently run for longer scream My extremely sketchy guess is that we cut down on the proportion of the time spent doing machine steps, so more of the time is builtins, which have more prediction error? Unclear!

You can dump the compiled code using nofib-exe (see the help) and then run it with uplc evaluate -t to get information about how often it sees the different constructors and the different builtins (use the -1 option to get a count rather than the budget consumed). That might help. You might have to do everything using flat if the parser can't handle the UPLC source.

@michaelpj
Copy link
Contributor Author

/benchmark plutus-benchmark:nofib

@iohk-devops
Copy link

Comparing benchmark results of 'plutus-benchmark:nofib' on 'a28a62d79' (base) and '9e0b319b3' (PR)

Script a28a62d 9e0b319 Change
clausify/formula1 32.62 ms 31.65 ms -3.0%
clausify/formula2 40.17 ms 38.84 ms -3.3%
clausify/formula3 109.6 ms 105.7 ms -3.6%
clausify/formula4 180.7 ms 170.0 ms -5.9%
clausify/formula5 699.9 ms 678.5 ms -3.1%
knights/4x4 92.46 ms 82.80 ms -10.4%
knights/6x6 249.3 ms 227.0 ms -8.9%
knights/8x8 414.5 ms 377.9 ms -8.8%
primetest/05digits 58.48 ms 56.13 ms -4.0%
primetest/08digits 107.8 ms 104.2 ms -3.3%
primetest/10digits 152.0 ms 147.3 ms -3.1%
primetest/20digits 303.2 ms 294.6 ms -2.8%
primetest/30digits 440.0 ms 424.3 ms -3.6%
primetest/40digits 587.6 ms 571.8 ms -2.7%
primetest/50digits 585.8 ms 568.8 ms -2.9%
queens4x4/bt 14.47 ms 13.05 ms -9.8%
queens4x4/bm 20.43 ms 18.63 ms -8.8%
queens4x4/bjbt1 18.03 ms 16.21 ms -10.1%
queens4x4/bjbt2 19.23 ms 17.34 ms -9.8%
queens4x4/fc 46.07 ms 41.17 ms -10.6%
queens5x5/bt 194.3 ms 175.4 ms -9.7%
queens5x5/bm 235.7 ms 216.1 ms -8.3%
queens5x5/bjbt1 229.9 ms 207.7 ms -9.7%
queens5x5/bjbt2 244.0 ms 219.1 ms -10.2%
queens5x5/fc 587.5 ms 527.0 ms -10.3%

@michaelpj
Copy link
Contributor Author

That's more like it.

@michaelpj
Copy link
Contributor Author

Okay, I think this is ready for review now. Seems to be pretty good overall. The main sadness is that I basically had to duplicate the whole inliner, but I tried to keep the divergences to a minimum and label them, so hopefully it won't be too bad.

I should probably write a few more tests before we merge, but I think the code is in a good state, and empirically it seems to work pretty well.

@michaelpj michaelpj requested review from a user and bezirg February 3, 2022 17:59
Comment on lines +6 to +7
(force
(delay
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay that forceDelayCancel doesn't happen in test/Lift tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: at the moment no optimization happens in Lift, this just makes that consistent. We probably should do some optimization in Lift, but that's a separate problem.

defaultCompilationOpts :: CompilationOpts
defaultCompilationOpts = CompilationOpts True False False False 8 True True True False
defaultCompilationOpts :: CompilationOpts a
defaultCompilationOpts = CompilationOpts True False False False 12 True True True mempty False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract _soMaxSimplifierIterations from defaultSimplifyOpts here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, but they're different. This is for the PIR compiler, the other is for the UPLC compiler, so (in principle) they shouldn't import each other. They end up being the same, but I think it's easier for now to just have them be quasi-independent. You could imagine wanting to tune them separately...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, makes sense, thanks.

michaelpj added a commit that referenced this pull request Feb 4, 2022
This has bothered me for a while. We rely on this to turn function
applications into let-bindings so that we can inline them, but because
of the way it was written it could only ever do this for at most one
argument at a time! So we could only fully-inline a function of arity N
with N rounds of the simplifier! Terrible!

Now it handles any number at once.

I wrote all the functions for this while working on
#4365, since there we
*really* need it, since constructors and destructors appear like
multiple function arguments rather than multiple nested
immediately-applied-lambdas. But it applies well here too, so I thought
I'd just do this improvement on the side.
@michaelpj michaelpj force-pushed the mpj/uplc-inline branch 2 times, most recently from 532d790 to 4d17e7a Compare February 7, 2022 10:08
This adds essentially a copy of the PIR inliner for UPLC. This has
pretty good results, with a 5-20% speedup and size decrease for many
programs.
@michaelpj
Copy link
Contributor Author

Hydra really doesn't like this PR, I'm going to open a new one.

@michaelpj michaelpj closed this Feb 7, 2022
MaximilianAlgehed pushed a commit to Quviq/plutus that referenced this pull request Mar 3, 2022
…ctMBO#4372)

This has bothered me for a while. We rely on this to turn function
applications into let-bindings so that we can inline them, but because
of the way it was written it could only ever do this for at most one
argument at a time! So we could only fully-inline a function of arity N
with N rounds of the simplifier! Terrible!

Now it handles any number at once.

I wrote all the functions for this while working on
IntersectMBO#4365, since there we
*really* need it, since constructors and destructors appear like
multiple function arguments rather than multiple nested
immediately-applied-lambdas. But it applies well here too, so I thought
I'd just do this improvement on the side.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants