-
Notifications
You must be signed in to change notification settings - Fork 20.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary Merkle Tree Hash #14334
Binary Merkle Tree Hash #14334
Conversation
0200fce
to
6170e9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general comments on the Go code, I haven't really reviewed the algorithm thoroughly yet though.
bmt/bmt.go
Outdated
select { | ||
case <-self.c: | ||
self.count-- | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default
clause is redundant, it will just lead to the next loop iteration trying to select on the channel again, so it may as well just be:
for len(self.c) > n {
<-self.c
self.count--
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true as the lock makes sure it is called nonconcurrently.
bmt/bmt.go
Outdated
|
||
// blocks until it returns an available BMTree | ||
// it reuses free BMTrees or creates a new one if size is not reached | ||
func (self *BMTreePool) Reserve() *BMTree { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this benefit from using sync.Pool
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of that but it is ok
bmt/bmt.go
Outdated
return NewEOC(nil) | ||
} | ||
rightmost := i == int(max-1) | ||
last := atomic.AddInt32(&self.max, 1) == max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This look like a race: two goroutines atomically read self.max
, both find int(max) > i
, both potentially calculate rightmost
to be true and race to set self.segment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is unused, will remove for now
bmt/bmt_test.go
Outdated
} | ||
|
||
func TestBMTHasherReuseWithRelease(t *testing.T) { | ||
testBMTHasherReuse(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is identical to TestBMTHasherReuseWithoutRelease
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* remove fmt prints in test * remove unused segmentWriter related code * added missing comments and package premable doc * fix pool reuse benchmarks * rename functions
l := len(d) | ||
left := d | ||
var right []byte | ||
if l > rh.section { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not be handling the case where hashsize < l <= 2*hashsize
here and calculating the hash of both left and right?
For example, consider the BMT of 64 zero bytes. I would expect it to be the sha3 of the concatenation of two sha3 hashes of 32 zero bytes:
sha3 := func(data ...[]byte) []byte {
h := sha3.NewKeccak256()
for _, v := range data {
h.Write(v)
}
return h.Sum(nil)
}
zeroes := make([]byte, 32)
fmt.Printf("BMT of 64 bytes: %x\n", sha3(sha3(zeroes), sha3(zeroes)))
Running that I get:
$ go run bmt.go
BMT of 64 bytes: 633dc4d7da7256660a892f8f1604a44b5432649cc8ec5cb3ced4c4e6ac94dd1d
However this reference implementation gives a different hash (which is just sha3(zeroes + zeroes)
):
h := bmt.NewRefHasher(sha3.NewKeccak256, 128)
fmt.Printf("Ref BMT of 64 bytes: %x\n", h.Hash(make([]byte, 64)))
$ go run ref-bmt.go
Ref BMT of 64 bytes: ad3228b676f7d3cd4284a5443f17f1962b36e491b30a40b2405849e597ba5fb5
Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is defined as per the reference implementation.
Did you find inconsistency between the ref result and the concurrent one?
or do you disagree with the definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am disagreeing with the definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarise, the ref implementation generates the BMT of 64 bytes as just sha3(64 bytes)
whereas I think it should be sha3(sha3(first 32 bytes) + sha3(second 32 bytes))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I side with @zelig here: the change proposed by @lmars would slightly decrease the complexity of the control logic of verification, but at the cost of having to calculate more hashes. There is no way that it will result in less gas use in the verification contract, which is one of the most important design considerations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that's fine, so I think we need to explicitly document that this is the intended implementation to avoid any confusion, and perhaps add an explicit test demonstrating the behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear from the test source that important corner cases are explcitly tested such as
- Empty chunk (eight zero bytes)
- Chunks shorter than one hash length
- Chunks of lengths that are not integer multiples of one hash length
I agree with @nagydani, we need some exhaustive tests for data of various lengths. |
@nagydani @lmars the tests cover all these: https://github.com/ethereum/go-ethereum/pull/14334/files#diff-b6e39ee34f8b8d3a7c7b4d8515bf94aaR49 |
@zelig so two comments on the tests:
|
@zelig I decided to just write some tests and I'm still not sure if I know what the reference implementation is supposed to be doing. For example, I tried to hash 65 bytes of data and expected it to be:
but I get a different result? Is my expectation again incorrect? Here is the test: https://gist.github.com/lmars/67f232dfbdbf8635364ec1901343e51b I really think it would be beneficial to be explicit and document the expected hashes and how they are constructed just using |
you assumption is again incorrect indeed.
|
Signed-off-by: Lewis Marshall <lewis@lmars.net>
@lmars great stuff like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After an interactive review session with @zelig , approved.
bmt is a new package that provides hashers for binary merkle tree hashes on size-limited chunks
the main motivation is that using BMT hash as the chunk hash of the swarm hash offers logsize inclusion proofs for arbitrary files on a 32-byte resolution completely viable to use in challenges on the blockchain.