Skip to content

Conversation

@jannotti
Copy link
Contributor

@jannotti jannotti commented May 7, 2021

Summary

Add a lot of math oriented opcodes that have been requested by teal users.

sqrt, exp, expw, divmodw

byteslice oriented math - a set of opcodes that will treat byteslices as arbitrary precision uints.

Test Plan

Unit tests

Byte math costs were established with a bunch of new operation benchmarks.

@jannotti jannotti force-pushed the teal4-math branch 2 times, most recently from b12c44c to 64df154 Compare May 12, 2021 15:00
Copy link
Contributor

@jasonpaulos jasonpaulos May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have b~ for bitwise invert as well?

Is it because byteslices are variable length and the normal invert is kind of undefined in this scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's confusing. If you start with 0xff00 and invert it, you get 0xff (note, only one byte). So then if you invert it again, you get "" (empty byteslice == 0). The results of these things are always only as long as needed to represent the answer as a bigint. So I think we instead have to recommend people use a bunch of 1s, and xor, to get the functional equivalent of bit negation. That way their choice of length for the 1s slice explicit forces the final result to be the same length.

I'm going to add an opcode to make such 1-strings easy to build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying all this outloud led to a better way. bitwise operation no longer truncate to the shortest byteslice that can hold the value. They return a byteslice as long as their longest input. This is more likely to be what people want, as they would use these operations when treating the slices as arrays of bits. It also makes the definition of b~ more straightforward.

@jannotti jannotti marked this pull request as ready for review May 17, 2021 20:38
Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, couple comments


if cx.err == nil {
postheight := len(cx.stack)
if spec.Name != "return" && postheight-preheight != len(spec.Returns)-len(spec.Args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the check is correct but the message would be -2 != -1 or similar that might be confusing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error would only occur because of a bug in our implementation, not in teal code. I don't know how user-friendly it needs to be. The info we need is there: There's a bug in the named opcode and it didn't manage the stack properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply do not remember checks for popped vs pushed. But I guess you are right and these problem would be already discovered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true, those checks are new. But they are checks on the results of individual opcode operations, so I don't think it is possible to construct a strange program that would trip them. If they catch a problem, it would mean that one of our existing opcodes has a pretty important bug that we have missed for a long time. I think that's unlikely. But it does help mistakes as we write new opcodes.

preheight := len(cx.stack)
spec.op(cx)

if cx.err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to go under consensus params switch, since there is a change upgraded nodes reject some specially-crafted program, and all allow it => fork

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It only checks things that would have previously caused an error. But now it does it in a central place, so that future opcodes won't have to do it themselves. In particular, in the past only concat could create a 4k slice, so only concat checked its output. Now we check here, in case we have introduced a new way to create a 4k slice (I don't think we have, since byteslice math is input limited, but this seems like a useful thing to centralize, especially as we might add other ways to hit 4k)

Comment on lines +941 to +942
opSwap(cx)
opLt(cx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought entire point of duplicating code in each opcode was in reducing number of additional operations.
Like opLe now is opSwap + opLt + opNot that is (5 + 6 + 3) = 14 instead of 6 in old version (counted mem read/write, swap is expensive since no direct mem to mem copy in a single cpu operation).
IMO this is questionable, maybe benchmark it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this when I found writing the equivalent byteslice operators to be tedious, repetitive, and error-prone. It's hard for me to imagine any appreciable performance effect here, with all this data in cache. I'll write a little benchmark to convince myself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the result. I put them in order of overhead introduced by the shorter, clearer code. Each one is on more function call than the last.

  1. Benchmark < 16.9-17.8 ns/op
  2. Benchmark >= 17.3-17.5 ns/op
  3. Benchmark > 20.0-23.3 ns/op
  4. Benchmark <= 21.8-22.0 ns/op

3 ns seems like an ok price to pay, for the largest overhead.

The biggest jump is between 2 & 3, because it involves opSwap. I suppose I could be convinced to repeat the code to eliminate that one. Then they'd all be so close the difference is in the noise.

Copy link
Contributor

@algorandskiy algorandskiy May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for benchmarking it! I think I agree, 3ns/op is OK price for cleaner code.
Extrapolate: suppose there are 300 (1/3 of program) of comparison instructions then there is 300*3ns =~ 1 us overhead. For 10k programs in a block we'll have 10ms execution time increase. I guess this is acceptable.

testAccepts(t, "int 2; log2; int 2; ==", 4)
testAccepts(t, "int 4; log2; int 3; ==", 4)
testAccepts(t, "int 5; log2; int 3; ==", 4)
testAccepts(t, "int 8; log2; int 4; ==", 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though log2(8) = 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, yes, thank you! Off by one everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I made this error is that integer log2 is just "what's the highest bit set in this integer (minus 1)".

I changed this so it's "correct", and fails when the supplied integer is 0. But I wonder if I should just rename it "highbit" and give it the old semantics. That sounds clearer to me, and is probably what people would actually need this opcode for (so they can decide how far to bitshift a value, for example). That's why this is an operation that's supported by the bits package. With the log2 semantics, callers need to check if the value is 0 first, to avoid panics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to call the op "bitlen" and define it on uints and byte-arrays. It returns the index of the highest bit, so it returns 0 on 0, rather than panic. (And 1 => 1, 2 => 2, 4=> 3, 8 => 4). It corresponds directly to bits.Len in Go.

@jannotti jannotti requested review from brianolson and cce May 21, 2021 18:23
Copy link
Contributor

@jasonpaulos jasonpaulos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, including the teal.tmLanguage.json file.

Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please consider adding one more test for all wide opcodes in a sequence to ensure format compatibility - like mulw, addw, divmodw, and something like (A * B / B == A) for mulw + divmodw. A seprate PR is OK.

Comment on lines +1256 to +1257
result := new(big.Int)
opBytesBinOp(cx, result, result.Add)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever

Comment on lines +1384 to +1387
a, b := opBytesBinaryLogicPrep(cx)
for i := range a {
a[i] = a[i] | b[i]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nice one

@algorandskiy
Copy link
Contributor

@brianolson please take a look

Copy link
Contributor

@brianolson brianolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one tiny testing quibble, but the 0.05% change probably doesn't matter.
Otherwise no objections. LGTM.

hashes := []string{"sha256", "keccak256", "sha512_256"}
for _, hash := range hashes {
b.Run(hash+"-small", func(b *testing.B) { // hash 32 bytes
benchmarkOperation(b, "byte 0x1234567890", hash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for each of these its only hashing 5 bytes on the first execution, and then {32,128,512} for the next 1999. It could be perfectly consistent by using a 32 byte constant and running dup; concat; dup; concat; {hash} as the repeated op?

@onetechnical
Copy link
Contributor

Brian's comments to be addressed in a separate PR.

@algojohnlee algojohnlee merged commit c8df55f into algorand:master May 25, 2021
tsachiherman pushed a commit to tsachiherman/go-algorand that referenced this pull request Jul 7, 2021
Add a lot of math oriented opcodes that have been requested by teal users.

sqrt, exp, expw, divmodw

byteslice oriented math - a set of opcodes that will treat byteslices as arbitrary precision uints.
@jannotti jannotti deleted the teal4-math branch January 28, 2022 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants