Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added dettman multiplication algorithm, made changes to equivalence checker #1481

Closed
wants to merge 20 commits into from

Conversation

OwenConoly
Copy link
Collaborator

Two main things that I did here:

  1. Adding the coq implementation of the "dettman multiplication" algorithm.
  2. Changes to the assembly equivalence checker:
    a. A canonicalization of sums of addcarries, which effectively makes the equivalence checker aware that double-wide (e.g., 128-bit) addition is associative. This is analogous to the preexisting canonicalization of commutative operations, which just sorted the operands.
    b. A (new, in-some-ways better) implementation of bounds on integer values corresponding to nodes in the dag. Instead of just revealing a few layers of the dag, and using that to bound an expression, I changed the data type of the dag, so that as nodes are inserted into the dag, their bounds are inserted along with the node and its description. The bound on each node can be calculated in terms of the bounds on its operands. So now, the bounds can "look to arbitrary depth", using this dynamic-programming type of strategy.
    c. A new rewriting rule, which replaces or operations with add operations in some cases. When we have a node of the form (orZ, [x; y]), for example, and x is a multiple of 16, and y is upper-bounded by 15, we can rewrite this as (addZ, [x; y]). This uses the bounds that I implemented.
    d. A new cli optional argument, "--extra-rewrite-rule", which allows optional rewrite rules (which are off by default) to be "turned on" when the equivalence checker is invoked. Currently, the only available optional rewrite rule is the or-to-add rule I mentioned in part (c), which can be turned on by writing "--extra-rewrite-rule or-to-add". It would be easy to add more optional rewrites. I found it necessary to add this option because the or-to-add rule was necessary for some dettman verification, but it was breaking some solinas verification.

OwenConoly and others added 20 commits July 30, 2022 21:32
…n multiplication algorithm--it now multiplies c by 16, instead of bit-shifting x before multiplying by c, as before. Defined and proved a term-collecting function for numbers in associative representation. The mulmod function now uses the term-collection function. With term collection, the mulmod function now appears to have been defined properly. Refactored, putting mulmod function, generalized mulmod function, and helper functions in separate files.
…sly, there was a reification error). Compiling general version now outputs identical C code to compiling specific version.
…ized mulmod algorithm--compiling to C with 9-limb representation for multiplication modulo 2^512 - 569.
…cation algorithm fails to compile to Bedrock2 for some reason.
…on algorithm to Bedrock2. fixed mul splitting, but still doesn't work.
…mbly equivalence checker to allow parsing asm files containing "or" instructions.
…dded automatic limb number, limb width inference to cli.
…embly checker. currently, the only optional rewriting rule is the or-to-add rewrite.
@andres-erbsen
Copy link
Contributor

please rebase, perhaps git fetch -a -v && git pull --rebase origin master

@OwenConoly
Copy link
Collaborator Author

please rebase, perhaps git fetch -a -v && git pull --rebase origin master

Ok!

Copy link
Collaborator

@JasonGross JasonGross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I think it might be worth splitting the PRs for easier merging? For example, I expect the implementation of dettman multiplication is more-or-less ready for merging (@andres-erbsen ?), the addition of dettman multiplication as a binary should be ready for merge once it's rebased to account for #1448; while I've requested a bunch of changes to the equivalence checking code.

d. A new cli optional argument, "--extra-rewrite-rule", which allows optional rewrite rules (which are off by default) to be "turned on" when the equivalence checker is invoked. Currently, the only available optional rewrite rule is the or-to-add rule I mentioned in part (c), which can be turned on by writing "--extra-rewrite-rule or-to-add". It would be easy to add more optional rewrites. I found it necessary to add this option because the or-to-add rule was necessary for some dettman verification, but it was breaking some solinas verification.

Could you track down what's going wrong here? It would be nicer if we didn't need to conditionally enable rewrite rules, and in theory they should all be compatible, right?

Instead of just revealing a few layers of the dag, and using that to bound an expression, I changed the data type of the dag, so that as nodes are inserted into the dag, their bounds are inserted along with the node and its description. The bound on each node can be calculated in terms of the bounds on its operands. So now, the bounds can "look to arbitrary depth", using this dynamic-programming type of strategy.

This is great!

src/Assembly/Equivalence.v Show resolved Hide resolved
Local Coercion Z.of_nat : nat >-> Z.
Local Coercion Z.pos : positive >-> Z.

Section Stuff.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please choose a better section name, or just do Section _. or Section __.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why the single underscore results in a syntax error?

src/Arithmetic/DettmanMultiplication.v Show resolved Hide resolved
src/Arithmetic/DettmanMultiplication.v Show resolved Hide resolved
src/Arithmetic/DettmanMultiplication.v Show resolved Hide resolved
Definition n_maybenat_spec : anon_argT
:= ("n",
Arg.Custom (parse_string_and parse_maybenat) "ℕ",
["The number of limbs, or the literal '(auto)' or '(autoN)' for a non-negative number N, to automatically guess the number of limbs"]).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't seem to actually support things like (auto5), though? This syntax is for choosing from amongst multiple possible suggested choices, when there are multiple ones that might make sense. Is this a sensible thing for Dettman multiplication? (How do the heuristics work?) If so, you should reuse what's in UnsaturatedSolinasHeuristics. Otherwise you should drop the (autoN).

@@ -1112,4 +1236,4 @@ Module ForExtraction.
: A
:= Parameterized.PipelineMain argv.
End BaseConversion.
End ForExtraction.
End ForExtraction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
End ForExtraction.
End ForExtraction.

Import Bedrock2First.

(*Redirect "/tmp/bedrock2_dettman_multiplication.hs"*) Recursive Extraction DettmanMultiplication.main.
(* cat /tmp/bedrock2_dettman_multiplication.hs.out | sed -f haskell.sed > ../../bedrock2_dettman_multiplication.hs *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix files missing a newline at the end of the file. It's possible that git rebase master --whitespace=fix will do this.

@@ -425,6 +471,10 @@ Module ForExtraction.
:= ([Arg.long_key "asm-reg-rtl"],
Arg.Unit,
["By default, registers are assumed to be assigned to function arguments from left to right in the hints file. This flag reverses that convention to be right-to-left. Note that this flag interacts with --asm-input-first, which determines whether the output pointers are to the left or to the right of the input arguments."]).
Definition asm_extra_rewrite_rules_spec : named_argT
:= ([Arg.long_key "extra-rewrite-rule"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it clear this is specific to the equivalence checker, e.g.,

Suggested change
:= ([Arg.long_key "extra-rewrite-rule"],
:= ([Arg.long_key "asm-extra-rewrite-rule"],

or

Suggested change
:= ([Arg.long_key "extra-rewrite-rule"],
:= ([Arg.long_key "asm-extra-rewrite-rules"],

Definition asm_extra_rewrite_rules_spec : named_argT
:= ([Arg.long_key "extra-rewrite-rule"],
Arg.String,
["Enables optional rewriting features of the assembly checker. Only relevant when --hints-file is specified. Valid options for this argument are: or-to-add. To specify multiple optional rules, pass this argument multiple times."]).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nicer to accept a comma-separated list. We should have some parse_list combinator that does this (I can look it up if it's hard to find)

@andres-erbsen
Copy link
Contributor

IIRC Owen put in quite a bit of effort to figure out a confluent rewrite system that works for all of our examples and concluded that there isn't anything reasonably easy and elegant. There exist generic techniques for dealing with this by transforming rewrite systems, but having a concrete and human-readable rewrite system is a huge benefit for troubleshooting. Unless you have a concrete suggestion for how to fix the issue, I'd suggest either merging the checker changes as-is or not merging them at all, depending on how much we value the optimization the optional rules verify. (we may still write about it in a paper, though).

I agree that Dettman multiplication itself is a great addition and should be fast-tracked if we take time figuring out the checker stuff, but I kinda wish we didn't.

@JasonGross
Copy link
Collaborator

Unless you have a concrete suggestion for how to fix the issue

I don't, though if there's a way to include the insight as a comment, I think that'd be great.

I'd suggest either merging the checker changes as-is or not merging them at all, depending on how much we value the optimization the optional rules verify.

I think merging them is better than not, and having them be conditional is fine. By "as is" do you mean "literally as is" or "without blocking on figuring out a better strategy"? Most of my requests are things like "don't use autogenerated names", "don't leave in print statements", "move utility code to Util", "use informative names", and some slightly broader suggestions like "use zrange instead of Z * Z" and "see if there's code-deduplication opportunities in this particular way around bounds computation".

@andres-erbsen
Copy link
Contributor

I meant merging a change based on the current strategy; I have not looked at the detailed feedback.

@OwenConoly
Copy link
Collaborator Author

I'm sorry you had to look through the print statements and such things; I thought that I had already made a pass through to get rid of that sort of garbage, but in any case I should've checked again before making a pull request.

And yeah, like Andres said, I tried to come up with a way to avoid the conditional rewriting thing, but I couldn't find a good way to do it. The way to fix it, I think, would've been to make the combine_consts function be much more general than it currently is. Rewriting ors into adds was causing the adds to get sucked up into the combine_consts function, and they got combined in ways we don't want them to.

One idea I had to fix that was be to try some sort of neat canonicalization strategy like I did with the addcarries; but that sounded pretty complicated, so I didn't think about it for very long. The other idea I had was the simplest possible canonicalization of expressions involving multiplication and addition: just distribute multiplication across addition. I experimented with this, but it led to absurd runtime increases for the assembly checker, as I suppose I should have expected. So, I gave up and did the conditional rewriting thing.

@JasonGross
Copy link
Collaborator

Rewriting ors into adds was causing the adds to get sucked up into the combine_consts function, and they got combined in ways we don't want them to.

Two thoughts/questions:

  1. What was going wrong with how they were combined?
  2. Very questionable idea: would it work to put this pass after combine_consts, so that combine_consts can't slurp up the adds? This would make it more fragile, but we already don't have a great story for confluence / ordering.

@OwenConoly
Copy link
Collaborator Author

OwenConoly commented Nov 8, 2022

The combine_consts function knows how to do this:
x + x*4 ===> x*5.
But it isn't smart enough to do this:
x + y + (x + y)*4 ===> (x + y)*5.

This means that if we don't activate the or-to-add rewriting thing, then combine_consts will do this:
(x | y) + (x | y)*4 ===> (x | y)*5.
But if we activate the or-to-add rewriting thing, then those ors become adds, and then combine_consts doesn't know to do this:
x + y + (x + y)*4 ===> (x + y)*5.

Then it seems like we have an easy fix: just augment combine_consts so it knows how to do that. But then it's not obvious what combine_consts should do with an expression like
x + y + z + (x + y)*4 + (x + z)*4.

Should that become
z + (x + y)*5 + (x + z)*4,
or should it become
y + (x + y)*4 + (x + z)*5?

And then things could potentially go badly if we make inconsistent choices for different expressions. At this point I got the idea, why not just distribute everything, and then combine terms? But that led to the runtime issues that I mentioned.

I had the "questionable idea" as well, and I tried it out. It doesn't work :(

Here's an example showing that it doesn't work.
If we merge (x | y) into the dag (and label it with the number a), then it gets rewritten as
(*a*) x + y.
Then suppose we merge (x | y)*4, and it gets rewritten as
(*b*) (x + y)*4.
Then suppose we merge (a + b) into the dag. It doesn't matter in which order we apply the or-to-add and combine_consts rules, since the operands a and b have already had both rules applied to them. So, when we merge (a + b), we get
(*c*) x + y + (x + y)*4,
instead of the desired result.

@OwenConoly
Copy link
Collaborator Author

please rebase, perhaps git fetch -a -v && git pull --rebase origin master

Would 'master' in this context refer to the master branch of my fork? I think I want it to be the master branch of the mit-plv repository, instead of my forked repository. Based on a couple google searches, this is my best guess for what to do:

git remote add original_repo https://github.com/mit-plv/fiat-crypto.git
git fetch -a -v original_repo
git pull --rebase origin upstream/master

Would that do what we want it to do?

@OwenConoly
Copy link
Collaborator Author

Do we currently have any rewrites that talk about addcarry vs shr? It seems plausible to me that we actually want to canonicalize all addcarry into shr of addZ (at least for powers of 2), add a rule that says that (x >> a) + (y >> b) is (x + (y >> (b - a))) >> a whenever b >= a, let the sorting happen by associativity of addZ, and make sure we have x >> 0 = x. Any thoughts on this strategy vs what you're currently doing?

I like that strategy much better! It's more general, and easier to work with as well. What I'm currently doing is essentially combining things into the addZ and shr, and then splitting them apart again into addcarries. The splitting apart is ugly and difficult to implement. It would be much nicer to just write it as the shr of an addZ.

I'm not sure what you mean by "at least for powers of 2". Isn't a power of two always involved?

@OwenConoly
Copy link
Collaborator Author

Did you distribute constants always, or only when the operand of the inner expression matched the operand of the outer expression?

I just distributed indiscriminately.

@OwenConoly
Copy link
Collaborator Author

If you're interested in some help splitting the PR and rebasing and stuff, I've made https://github.com/mit-plv/fiat-crypto/tree/dettman-mul-only-arith and https://github.com/mit-plv/fiat-crypto/tree/dettman-mul-partial-split;

Ok, I will start with those.

@JasonGross
Copy link
Collaborator

I'm not sure what you mean by "at least for powers of 2". Isn't a power of two always involved?

That was me not remembering whether addcarry's argument was a bitwidth or a modulus. Indeed everything is a power of two here

@JasonGross
Copy link
Collaborator

Did you distribute constants always, or only when the operand of the inner expression matched the operand of the outer expression?

I just distributed indiscriminately

It seems at least plausible that more restricted distribution would be performant enough?

@JasonGross
Copy link
Collaborator

If you're interested in some help splitting the PR and rebasing and stuff, I've made https://github.com/mit-plv/fiat-crypto/tree/dettman-mul-only-arith and https://github.com/mit-plv/fiat-crypto/tree/dettman-mul-partial-split;

Ok, I will start with those.

I'd suggest taking the dettman-mul-only-arith branch, renaming the section that I commented on in this PR (I think it's called "stuff"?), rebasing it on master, and preparing a PR from that branch (or equivalent), with only the arithmetic template; Andres can review that and hopefully it can be merged quickly.

Then I'd suggest pulling out the dettman binary/CLI into its own branch on top of that (maybe just copy the files and do it in one commit?). Note that I drastically reduced the number of typeclass arguments about a month ago, and the Makefile changes now go in Makefile.examples and Makefile.config, so there's some merge conflicts to fix. That PR should also be quickly mergable.

I'd also prefer to have the four changes to the equivalence checker separated out. I've reimplemented the ability to have conditional passes at #1498 (note that all passes now need to be listed in an enum and have descriptions right below their definition), and I've moved the bounds definitions up in #1499, which should make it much easier to write independent passes in a way that minimizes merge conflicts. (I've also adjusted bounds checking to give back a zrange in #1485, and integrated some of the other smaller changes in #1486, #1487, #1490, #1492, #1494.)

I think the pass that rewrites or to add should be quickly mergable, if you separate it out (is the PHOAS-based bounds analysis powerful enough for this, or did I miss some refinement?).

I expect that the "combine shr" pass should be mostly self-contained and easily mergable.

The "turn addcarry into shr" pass is probably going to impact a bunch of the other addcarry passes, and might even significantly simplify some of them?

Finally, the "memoize bounds into the dag" change will need some work on top of what's already in this PR, in particular to respect the abstraction barriers of the dag module. I started looking into this a bit, but an not sure how much more time I'll find for it. In any case I think you should save this PR for last.

What do you think of this plan for splitting up the PRs?

@JasonGross
Copy link
Collaborator

I guess we maybe also want another (preliminary?) PR that gives all rewriting passes access to the dag, and gives their Ok proofs access to dag_ok or w/e.

And it would be nice to have separate PRs (or at least commits) for adding bounds info to the dag vs making use of that bounds info and changing the inspection depth, so that we can profile these separately.

@OwenConoly
Copy link
Collaborator Author

What do you think of this plan for splitting up the PRs?

It sounds good! I am unable to push changes to the dettman-mul-only-arith branch though. I get a "permission denied" error. Can you give me permission to do that?

@OwenConoly
Copy link
Collaborator Author

It seems at least plausible that more restricted distribution would be performant enough?

Yeah, very plausible.

@JasonGross
Copy link
Collaborator

What do you think of this plan for splitting up the PRs?

It sounds good! I am unable to push changes to the dettman-mul-only-arith branch though. I get a "permission denied" error. Can you give me permission to do that?

Done! You should have an invite on GitHub

@JasonGross
Copy link
Collaborator

I guess we maybe also want another (preliminary?) PR that gives all rewriting passes access to the dag, and gives their Ok proofs access to dag_ok or w/e.

And it would be nice to have separate PRs (or at least commits) for adding bounds info to the dag vs making use of that bounds info and changing the inspection depth, so that we can profile these separately.

@OwenConoly I've done these things in #1503 and #1505. You should base any changes to the equivalence checker on top of these PRs, so that they don't conflict with them. I'm optimistic that the remaining planned changes to equivalence checker should not introduce more conflicts with each other.

@Rayyy369

This comment was marked as spam.

@Rayyy369

This comment was marked as spam.

@Rayyy369

This comment was marked as spam.

@JasonGross
Copy link
Collaborator

@OwenConoly I've prepared a series of PRs (#1508, #1509, #1510, #1511) which implements the dynamic programming bounds checking algorithm. It doesn't change the revealed depth (insofar as that's possible, could you prepare a PR that does just that? And then one of us can get a timing profile on transparent-rooster).

I expect to not do much more on the equivalence checker anytime soon (sorry for all the conflicts with your work), and will leave you to implement add-to-or, the combine shr, and the turn-addcarry-into-shr passes. (Let me know if you have any questions about the existing passes / what they do / how to adapt them to handle shr instead of addcarry.)

@OwenConoly
Copy link
Collaborator Author

It seems plausible to me that we actually want to canonicalize all addcarry into shr of addZ (at least for powers of 2), add a rule that says that (x >> a) + (y >> b) is (x + (y >> (b - a))) >> a whenever b >= a, let the sorting happen by associativity of addZ, and make sure we have x >> 0 = x.

I think I'll just implement this rule for the case that b = a.

I thought about doing it for the case that b > a, but I didn't really like the idea of taking a simple sum like
(x >> a) + (y >> b) + (z >> c)
and turning it into something arbitrarily deeply nested like
(((x >> (a - b)) + y) >> (b - c)) + z) >> c.
Then we would have to 'dig up' arbitrarily many layers to ensure that we keep the canonical representation when a new addend is added to the sum. This sounds a bit complicated--and it looks like it could take time that's quadratic in the number of addends (I'm not sure how much that matters).

@OwenConoly
Copy link
Collaborator Author

Oops, there's a bigger problem here. This equality that I'm trying to use isn't true. For example,
(7 >> 3) + (7 >> 3) = 0,
while
(7 + 7) >> 3 = 1

I'll have to try a bit harder to remember how my addcarry rewriting thing actually worked.

@OwenConoly
Copy link
Collaborator Author

OwenConoly commented Nov 23, 2022

Ah, I see. We want a rewrite rule that turns
((x + y) >> a) + (((x + y) mod 2^a) + z) >> a
into
(x + y + z) >> a,
where the pluses are addZ, and the (x + y) mod 2^a would usually look like (add a, [x; y]).

@OwenConoly
Copy link
Collaborator Author

But then I'm not sure that I really like this approach, because it's not clear what the right thing is to do with this expression:
((x + y) >> a) + ((((x + y) mod 2^a) + z1) >> a) + ((((x + y) mod 2^a) + z2) >> a).

Should that be rewritten as
((x + y + z1) >> a) + ((((x + y) mod 2^a) + z2) >> a),
or as
((x + y + z2) >> a) + ((((x + y) mod 2^a) + z1) >> a)?

@JasonGross
Copy link
Collaborator

What about rewriting (x >> a) + (y >> a) into ((x + y - (x mod 2^a) - (y mod 2^a)) >> a and then doing some sort of cancellation on this expression. The invariant here is that the normal form of ((x - (x mod 2^a)) >> a should be the same as the normal form of x >> a, and if you can accomplish that, then you can freely combine right shifts (as long as the shift amount is the same).

@JasonGross
Copy link
Collaborator

JasonGross commented Nov 23, 2022

(The other rule you'd want here is that (x + (y mod b)) mod b becomes ((x mod b) + (y mod b)) mod b or something like that, and then I think the rest follows from rules like x - x = 0, which we should already have.)

@OwenConoly
Copy link
Collaborator Author

Ok, neat! I'll do that, and I'll rewrite (x + (y mod b)) mod b into (x + y) mod b.

@andres-erbsen
Copy link
Contributor

@JasonGross alerted me that this change is also needed for the artifact. @OwenConoly would you be willing to do a quick call with me today to see if I can help get this merged?

@andres-erbsen
Copy link
Contributor

a. A canonicalization of sums of addcarries, which effectively makes the equivalence checker aware that double-wide (e.g., 128-bit) addition is associative. This is analogous to the preexisting canonicalization of commutative operations, which just sorted the operands.

I cannot figure out what part of the PR does this.

b. A (new, in-some-ways better) implementation of bounds on integer values corresponding to nodes in the dag.

This seems to be #1511

c. A new rewriting rule, which replaces or operations with add operations in some cases.

https://github.com/mit-plv/fiat-crypto/pull/1481/files#diff-b7349927aba2f511b65b58306adb1b775a3d0aa3944d1baab4ac274639fd28efR4911-R4926; this doesn't seem to be merged yet but probably wouldn't be to difficult.

d. A new cli optional argument, "--extra-rewrite-rule", which allows optional rewrite rules (which are off by default) to be "turned on" when the equivalence checker is invoked.

Is this #1498 ?

Is there anything else we still need?

@JasonGross
Copy link
Collaborator

canonicalization of sums of addcarries, which effectively makes the equivalence checker aware that double-wide (e.g., 128-bit) addition is associative. This is analogous to the preexisting canonicalization of commutative operations, which just sorted the operands.

I cannot figure out what part of the PR does this.

I think it's standardize_adc_sum

b. A (new, in-some-ways better) implementation of bounds on integer values corresponding to nodes in the dag.

This seems to be #1511

I think that's right

d. A new cli optional argument, "--extra-rewrite-rule", which allows optional rewrite rules (which are off by default) to be "turned on" when the equivalence checker is invoked.

Is this #1498 ?

Yes

@OwenConoly
Copy link
Collaborator Author

@JasonGross alerted me that this change is also needed for the artifact. @OwenConoly would you be willing to do a quick call with me today to see if I can help get this merged?

I am just seeing this now. Sorry about that; I thought GitHub was supposed to email me when I was tagged, but perhaps the email got lost.

I assume this is no longer relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants