-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Case
ing on values of built-in types
#6602
Comments
Great this is exactly what I would like to make compilation simpler and more efficient. |
Very exciting stuff! It's nice to see more support around directly interacting with |
(i) Thanks for thinking along these lines. Obviously, additional instructions to speed up data processing are a good idea. (ii) Since there are a finite number of built-in types, an alternative would be to add a new case for each: caseBool, caseInteger, caseData, etc. Presumably that would be slightly more efficient, because it will not be necessary at runtime to determine what kind of data is the motive of the case. (Here I use the following terminology: (iii) Your explanation of the meaning of case for Integer is incomplete. In (iv) Your explanation of the meaning of case for Data is incomplete. You gave only this example
You don't explain the role of |
Yes:
Slightly faster at runtime, slightly slower to decode, who knows if it'll pay out. But dispatching on type of the scrutinee is just a bit of constant overhead, the problem that we're addressing here is a ridiculous amount of existing overhead. So I don't think we need a perfect solution here, just one that solves the problem. If it some point we decide that we want to hardcode builtins into the AST, we can do it one fell swoop and create an entire new UPLC, I don't personally see much value in breaking the current design just to save some constant costs here and there. We can get much more out of giving up on extensibility if we want to.
My bad, sorry, I've now specified that the match should fail.
Made it this:
Hope this is clearer. Some of that context is in the #5777 and #6225 and I didn't want to duplicate it here. Sorry for any confusion caused! Just to nitpick:
We normally refer to |
Thanks for the response. The name "scrutinee" is specific and easier to understand than "motive"; good choice. You may be right that I've misunderstood the latter term. Thanks for the clarification of why you are not defining separate operators for each sort of scrutinee. That should make its way into the design document. I agree with your reasoning, especially the point about not wanting to overuse the four bits available to encode AST constructors. Thanks also for the clarification of the meaning of the various operators. |
We really want to have faster
Data
processing, but as per #6225 this is quite hard. Matching on integers is very inefficient too as one needs to perform repeated calls toIfThenElse
andEqualsInteger
(the possibility of using SOPs for matching on integers to solve the issue was brought up here). Matching on built-in booleans is also inefficient, particularly from the size point of view.I feel like we can kill all these birds with a single stone: allow
case
to work on values of built-in types.Bool
Take booleans, we could allow
where
b
is aBool
branchFalse
is executed ifb
isFalse
branchTrue
is executed ifb
isTrue
(I'm making
branchFalse
come beforebranchTrue
not only becausefalse
corresponds to0
andtrue
corresponds to1
, but also because thefalse
case is more likely to terminate execution, hence being pretty-printed an average program would be more readable with the throwing clause appearing first. But we can make the ordering the same as inIfThenElse
instead if we feel strongly about it, I personally don't care much)This is as close as it gets to extending the UPLC AST with a constructor for
IfThenElse
without actually doing that. It should also be easy to implement, sinceCase
is currently defined asand all we need is to teach the CEK machine how to extract
Term
s from aVector
(plus check its length) and how to handle them depending on the type tag of the given constant.Note how we would be able to address everything from #6598 without introducing any new builtins. For example if we had
and wanted to write
force (assertAndContinueOrTrace msg b (delay x))
, we could instead simply writeThis AST is of roughly the same size, but no
force
-delay
bookkeeping is required, just like the additional builtin. The latter code is concise and on-point.Integer
For integers we'll allow
reducing to
branch_i
wherei
is the scrutinee or failing ifi
is not in[0, N]
.This alone will optimize
Data
processing, because one will be able to write (pseudocode)This is clearly unideal, since we very much want to bypass the
unConstrData
andapplyToList
noise, but this is a good start and an improvement in the right direction.Note that associating
case
branches with hardcoded integers (0
,1
...n
) isn't much of a limitation, since one can recover full generality by adding anelemIndex
builtin:For example the following:
will evaluate to
branch42
.Functioning
case
on integers with the addition ofelemIndex
will also allow us to have Deterministic universal almost-unique Plutus Constructors if we ever feel like it.Data
Ultimately we want
case
to handleData.Constr
the same way that it handlesTerm.Constr
. There's still the limitation discussed in the "What exactly breaks if we add the unsafeconstrTermFromConstrData
?" section of #5777, but that one we should be able to address via multi-lambdas/multi-applications (see #6225). Which will help SOPs as well I believe, since multi-lambdas should be faster than iterated lambdas, plus UPLC is untyped, so there's still a possibility of getting the same mismatched semantics issue with SOPs as withData
.So in the end we want
to evaluate to
which is a multi-lambda
\[x, y, z] ->
multi-applied to[a, b, c]
.I.e. that case means (pseudocode)
and branch selection with
Data
will work the same way as with SOPs: for theConstr i args
scrutinee thei
th branch gets picked, so given thati
is 1 in the example above, we pick the 1th branch and drop the 0th and the 2th ones (we only have the 2th one to make the example a bit more rich, nothing would change if it wasn't there). Unlike SOPs weNote that casing on
Data
is orthogonal to pattern matching builtins, since the latter allow one to handle all constructors ofData
, whilecase
overData
only handles theConstr
constructor in a specific SOP-like way.Lists and pairs
I'm not sure how we should handle those in
case
-expressions. Just replicate pattern matching builtins? Or allow to match against multiple patterns like in Haskell (e.g. for lists[]
,[x]
,[x, y]
,x:y:z:xs
etc)? Do we then want to changeCase
to contain proper patterns rather than arbitrary terms?All of that is to be figured out, but this is way less important than speeding up processing of
Data
, so it can definitely wait.Elaborate
case
to builtin calls?Why reimplement
case
for booleans if we already haveIfThenElse
? Can we just elaboratecase
toIfThenElse
under the hood? I believe the answer is yes, but it's awkward:IfThenElse
expects values while we'll need to pass terms in there, which is doable as we can simply wrap terms into values (usingVDelay
for example), but then we need to unwrap the result, plus we want all theBuiltinRuntime
nonsense (that is required for builtins but notcase
) to get inlined away and persuading GHC is always non-trivial, so I feel like it's much easier to just reimplement a few lines of logic than attempt a clever general solution requiring an order of magnitude more effort to implement and maintain.Summary
It seems like this path will allow us to iteratively improve the status quo w.r.t. performance and ultimately resolve the long-standing issue of
Data
processing being slow, which is among the most pressing issues with UPLC at the moment.The text was updated successfully, but these errors were encountered: