-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[randomness part 2] improve usafeRandom and update math/rand in FVM #4067
Conversation
FVM Benchstat comparisonThis branch with compared with the base branch onflow:master commit b500369 The command Collapsed results for better readability
|
Codecov Report
@@ Coverage Diff @@
## master #4067 +/- ##
==========================================
- Coverage 53.46% 53.46% -0.01%
==========================================
Files 835 835
Lines 78129 78150 +21
==========================================
+ Hits 41769 41779 +10
- Misses 33014 33022 +8
- Partials 3346 3349 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 6 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@tarakby can you explain a bit more on this?
why KDF necessary here ?
as we seed on each block, what does this mean ?
considering number of randoms we can use in a transaction, is this important? |
} | ||
stdev := stat.StdDev(distribution, nil) | ||
mean := stat.Mean(distribution, nil) | ||
assert.Greater(t, tolerance*mean, stdev, fmt.Sprintf("basic randomness test failed. stdev %v, mean %v", stdev, mean)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check seems weird. std dev is meaningless for uniform distribution. you want something like
expectedMean := math.MaxUint64 / 2
expectedMean * (1 - tolerance) <= mean <= expectedMean * (1 + tolerance)
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two comments to your suggestion:
-
Imagine a dice with the possible outputs
[1,2,3,4,5,6]
and the random variable being the value we read after throwing the dice. In this case the expected mean according to your formula is3,5
. The uniform distribution we would like a perfect dice to have is[1/6, 1/6, 1/6, 1/6, 1/6, 1/6]
. How can we verify that with a practical statistical test? Let's say we throw the dice 100 times, and we get 50 times1
, and 50 times6
. The mean in this case is exactly the expected mean3,5
( (150+650)/100 ), and we pass the inequality above with any small tolerance we chose. However the dice we tested is clearly biased and its distribution is far from uniform. -
The test of the PR takes a different approach. It throws the dice 100 times and computes the frequency of each possible output in
{1,2,3,4,5,6}
. If the dice is perfect, we would get an array approximately[16 ,16, 16, 16, 16 ,16]
. In other terms, the expected mean is100/6 = 16,66
and we would like to test that each value of the array is close to that expected mean. The standard deviation is a basic statistical tool to measure that: it quantifies the dispersion of many values compared to their mean. A low standard deviation means that all values are close to the mean, a high standard deviation means the values are spread out from the mean. The biased dice in the example above gives[50, 0, 0, 0, 0, 50]
and it gives a very large standard deviation. This test remains a basic statistical test that aims to find obvious issues or software bugs, it's relatively quick so that it can run as a unit test. But of course there are deeper randomness statistical tests that are beyond the scope of a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to plug this into some existing test suite for random generators? like https://github.com/terrillmoore/NIST-Statistical-Test-Suite in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are an example of the deep statistical tests that I mentioned above. They are heavy (take time and write in lots of memory) and are used to test a TRG (true random generator) or a new implementation of a PRG. In our case we are using a known and established PRG underneath, called ChaCha20. Such PRG has been already studied on a cryptanalysis level (theory level) and statistical level (running tests on its outputs). Our code repo tests do not mean to re-evaluate ChaCha20, they only try to detect biases resulting from implementation bugs, while running relatively fast.
Btw, the crypto lib (where ChaCha20 is implemented) already checks that the implementation is compliant to https://www.rfc-editor.org/rfc/rfc7539, so we don't have to repeat it here.
for i := 1; i < N; i++ { | ||
allEqual = allEqual && numbers[i] == numbers[0] | ||
// tests that unsafeRandom is PRG based and hence has deterministic outputs. | ||
t.Run("PRG-based UnsafeRandom", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test to verify each transaction with the block has a deterministic / unique seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in the current implementation, the seed isn't diversified per transaction, it is only diversified per block, so such a test would fail.
We can add the test you described in the PR where we include the transaction index in the PRG construction.
source := rand.NewSource(int64(binary.BigEndian.Uint64(id[:]))) | ||
gen.rng = rand.New(source) | ||
// extract the entropy from `id` and expand it into the required seed | ||
hkdf := hkdf.New(func() hash.Hash { return sha256.New() }, id[:], nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txn index as the salt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep the idea is to include the transaction index somewhere in the PRG construction (as a KDF salt, as a PRG customizer..).
I only wanted this PR to remove math/rand
and use a safer PRG. Wiring the transaction index down the stack till the generator could be done in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm. add a todo as a reminder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe address my expectedMean comment. I don't feel strongly about it.
@bluesign good questions!
Block ID can be biased by a consensus node. It is possible for consensus nodes to loop over possible blocks and choose one with a block ID with the random outputs they prefer.
Here we have an entropy source as an input (block ID + soon a transaction index) and we would like to extract a seed for a PRG (ChaCha20). We need a function to link both information, and achieve 2 properties:
I'll improve the code doc to reflect the above.
PRGs have an internal state, that is initialized with a seed, and is updated each time it produces a random. If the state is known, all future outputs can be known. The state of some PRGs can be recovered based on querying multiple successive outputs. Imagine a transaction that doesn't know what seed is used but queries many randoms to recover the state, and predict the future outputs. This is not possible using a CSPRG like ChaCha20, i.e it's not possible to recover the state.
Probably not if the calls aren't many. I'm not sure how many calls a transaction can make, or maybe some applications can gather a large amount of randoms off-chain through multiple txs/blocks. |
Sorry I am maybe asking too much, but this is my favourite subject
If blockID can be biased and we do Considering we have random uniform block ID; KDF should be unnecessary I think ( unless we need chacha specifially ) So we end up with KDF + chaha depends on chacha benefits. ( so if we have benefit from chacha + not manipulated blockID )
As you state each transaction will run run with an independent seed ( so previous transaction cannot influence the next transaction's random state, actually even with the blockID currently cannot ) What will be the benefit from unpredictable next random here? It would make sense if we would use blockID as seed then not reseed, but reseed on every tx ( with blockID or in the future with blockID(+)tx_index) As en example let's say I naively made a RNG with "S = sha3(seed)" and "S_next = sha3(S)", how does it compare to the current one ? |
} | ||
stdev := stat.StdDev(distribution, nil) | ||
mean := stat.Mean(distribution, nil) | ||
assert.Greater(t, tolerance*mean, stdev, fmt.Sprintf("basic randomness test failed. stdev %v, mean %v", stdev, mean)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to plug this into some existing test suite for random generators? like https://github.com/terrillmoore/NIST-Statistical-Test-Suite in the future?
This is a technically very subtle topic @bluesign, so I am glad we are thinking it through properly. I am strongly of the opinion that the answer is: no it doesn't solve the attack. The block hash is inherently not a good source of entropy, which is why most blockchains out there cannot provide safe randomness. Flow is one of the very few networks that has a unpredictable and relatively unbiasable source of entropy, aka our Random Beacon. Let me explain:
The underlying insight is: Any data object that a single node constructs and where the node has any influence how the byte representation looks as a bad source of entropy. The message creator can always construct multiple variants and only release favourable ones. |
thanks for the detailed answer @AlexHentschel Just a quick question; why do you think it is small probability?
I read the paper (https://arxiv.org/pdf/2002.07403.pdf) again, 4.3.4 source of randomness section mentions something like, t signatures. considering t<N number of consensus nodes, doesn't primary have option to choose from any combination of (N, t) ? |
@bluesign good question. block hash as entropy sourceIn my comment above, I was talking about the scenario where we are using the block hash as entropy source:
Threshold signature (aka random beacon) as entropy sourceSection 4.3.4 in our paper describes the random beacon, which is conceptually very different than the block hash:
Therefore, the consensus node constructing the block can't predict the source of randomness for its block. It has to commit and publish the block first. Furthermore, the node constructing the child block (B in our example) containing the source of randomness for the parent (A in our example) also can't influence the source of randomness, because the threshold signature only depends on the signed block A. Additional Details
|
@AlexHentschel thank you so much, I learned a lot from this. |
@bluesign I'm glad you're asking, this is a fun but complex topic, discussing it helps improving the design. Let me first clarify that this PR isn't making
Yes correct, the source of entropy here is the block, and that's always bias-able. Applying any composition of deterministic functions (here
I didn't get your point here, but let me add some details, hopefully they help:
You're right and are touching a good point! Now we have the option to choose between a CSPRG or a non-crypto PRG: If we use a PRG per transaction, a CSPRG may not be very useful as you pointed (* I wanna come back to this). We could just go with a non-crypto PRG as biasing the PRG state wouldn't help to mount an attack. However, if we go with a PRG per block/chunk, a CSPRG becomes required to stop biasing attacks. At this point, I don't want to make assumptions about future evolutions of the implementation, so I prefer to go with the safer choice. A future developer updating the design may not differentiate a CSPRG and a non-crypto PRG, and if a CSPRG is already in place, we won't be exposed when the design uses a PRG per chunk for instance. (*) back to this assumption: if we use a PRG per transaction, is biasing the PRG state really fine? I'm not sure about the answer. What if the transaction starts by calling a malicious function
That's an example of a non-crypto PRG, because the internal state of the PRG is the output itself (a single call to the PRG recovers the state). We can use it if we don't care about the attacks that recover the state. |
Thanks Alex for the complete description of how the random beacon works! Thanks to your recent work on the Jolteon protocol and the new 2-block finalization, It's great news that biasing the random beacon is reduced to choosing between 2 options only.
For the sake of technical accuracy, I wanted to add a slight correction. The threshold signature is indeed independent of the set of signers, thanks for the underlying signature scheme (BLS) being unique. Unique is stronger than deterministic and that what we would require for the random beacon design. Determinicity only isn't enough. |
@tarakby thank you very much, I just understood the point here, it is a very strong one. Btw when you say "A PRG per block" do you mean it is seeded per block? ( never seeded again ) Or is it is as is now, seeded with same seed on every transaction of the block? I think whatever algorithm we use, seeding with block ID on every transaction ( like current implementation ) losing the uniformity ( not sure if this is the correct term ) Consider I am minting an NFT, let's say 100 NFTs, 1 legendary, 99 common. ( 1 in a 100 change I will mark legendary ) If I send this in 100 transactions in same block. Either I will mint 100 legendary or 100 common. Also even with unpredictable PRG, as transactions can communicate with each other, they can decide to buy or not to buy. Is my thinking correct here?
hmm this is a very good point, I think if PRG is predictable, is a risk, but my initial gut feeling tells me this is a very minor one. |
thanks Tarak for your correction. Really really appreciate you contributions with keeping things technically accurate 💚 |
Yes I meant creating only one instance of PRG per block execution, seeding it once and never reseed it.
In the current implementation, this is correct.
I guess you mean the transaction can be reverted after the random is revealed. Yes that's possible regardless of the PRG properties. I think this is an inherent issue to any platform that allows rolling back a transaction result, and I think it's beyond the randomness quality. I believe smart contract developers need to take this into account, maybe with a commit-reveal scheme, like mentioned here. |
I was referring to seeding with same seed, on each transaction; one is generating random and saving to storage, second transaction is checking storage, if result is win, calling the contract function. ( as you said we need per tx seed or running block seed for sure ) |
bors merge |
( result of splitting #4052 into several PRs)
math/rand
functionsSeed
andRead
.unsafeRandom
implementation: (@AlexHentschel @pattyshack)math/rand
PRG. Advantages are:unsafeRandom
tests:Todo in another PR: diversify the seed per transaction.