-
Notifications
You must be signed in to change notification settings - Fork 110
Integrate encryption into dpa and chunker #327
Integrate encryption into dpa and chunker #327
Conversation
swarm/storage/dpa.go
Outdated
self.running = false | ||
close(self.quitC) | ||
} | ||
// func (self *DPA) Start() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in db1b51a
swarm/storage/pyramid.go
Outdated
@@ -195,6 +189,9 @@ func (self *PyramidChunker) Split(data io.Reader, size int64, chunkC chan *Chunk | |||
}() | |||
|
|||
defer close(quitC) | |||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for func() {}() see line above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f32d4cc
swarm/storage/pyramid.go
Outdated
@@ -203,28 +200,25 @@ func (self *PyramidChunker) Split(data io.Reader, size int64, chunkC chan *Chunk | |||
} | |||
case <-time.NewTimer(splitTimeout).C: | |||
} | |||
return rootKey, storageWG.Wait, nil | |||
return rootKey, func() { return }, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was before wait was implemented hasherstore. Proper wait implementation is added in f32d4cc
swarm/storage/chunker.go
Outdated
key: key, | ||
chunkC: chunkC, | ||
key: key, | ||
// chunkC: chunkC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f32d4cc
swarm/storage/chunker.go
Outdated
key: key, | ||
chunkC: chunkC, | ||
key: key, | ||
// chunkC: chunkC, | ||
chunkSize: self.chunkSize, | ||
branches: self.branches, | ||
hashSize: self.hashSize, | ||
depth: depth, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on depth should be added (TODO)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 582c6e5
swarm/storage/hasherstore.go
Outdated
} | ||
} | ||
|
||
func NewHasherStore(chunkStore ChunkStore, hashFunc SwarmHasher, toEncrypt bool) *hasherStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment exported functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 8016f18
swarm/storage/hasherstore.go
Outdated
|
||
return Reference(append(chunk.Key, encryptionKey...)), nil | ||
|
||
//TODO: implement wait for storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this and added proper wait implementation in f32d4cc
swarm/storage/pyramid.go
Outdated
@@ -494,8 +483,8 @@ func (self *PyramidChunker) prepareChunks(isAppend bool, chunkLevel [][]*TreeEnt | |||
workers := self.getWorkerCount() | |||
if int64(len(jobC)) > workers && workers < ChunkProcessors { | |||
self.incrementWorkerCount() | |||
storageWG.Add(1) | |||
go self.processor(self.workerCount, jobC, chunkC, errC, quitC, storageWG) | |||
// storageWG.Add(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f32d4cc
This is not a fully functional change. Chunker api changed: it uses Putter and Getter interfaces to put and get chunks from the store and to do the encryption/decryption if needed. These putters and getters should also the hashing, so hashing related code could be removed from chunker. Some things are temporary, pyramid chunker doesn't work yet (only tree chunker), and hashing and putting in dpa is not parallelized.
PyramydChunker doesn't work yet with Putter/Getter
creation This fixes the tests with pyramid chunker
Chunker code is much simpler if all data of the job is stored on the chunker object itself.
cc05f60
to
f32d4cc
Compare
swarm/storage/chunker_test.go
Outdated
@@ -263,7 +174,7 @@ func TestSha3ForCorrectness(t *testing.T) { | |||
|
|||
} | |||
|
|||
func TestDataAppend(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gbalint why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swarm/storage/hasherstore.go
Outdated
@@ -0,0 +1,223 @@ | |||
// Copyright 2016 The go-ethereum Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump up copyright to 2018.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swarm/storage/hasherstore.go
Outdated
return Reference(append(chunk.Key, encryptionKey...)), nil | ||
} | ||
|
||
// Returns data of the chunk with the given reference (retrieved from the ChunkStore of hasherStore). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but godoc comments should start with the name of the of the symbol described. Applies to multiple places. https://blog.golang.org/godoc-documenting-go-code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swarm/storage/hasherstore.go
Outdated
return chunkData, nil | ||
} | ||
|
||
// The Close() indicates that no more chunks will be put with the hasherStore, so the Wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close indicates...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 2329cfb
// NewHasherStore creates a hasherStore object, which implements Putter and Getter interfaces. | ||
// With the HasherStore you can put and get chunk data (which is just []byte) into a ChunkStore | ||
// and the hasherStore will take core of encryption/decryption of data if necessary | ||
func NewHasherStore(chunkStore ChunkStore, hashFunc SwarmHasher, toEncrypt bool) *hasherStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think HasherStore
is a good candidate for a few unit tests. It is a new type, and having a bunch of simple unit tests would help documenting it and reduce the cognitive load on readers, since a few simple examples of actual use of the type speak much more than any comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree, I'm adding a few
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test added in cdc7bed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. I would however suggest we do this with table-driven tests in order to have idiomatic go code. Additional benefit is:
- Less overhead of having to think about multiple methods.
- Input params to tests are named.
https://gist.github.com/nonsense/87ff83e0a76f3b75372f5f3c52dbc439
What do you think?
There is only a slight difference in this test, as both are quite simple, but if we have more parameters, naming them makes it much easier to reason about the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swarm/storage/hasherstore.go
Outdated
|
||
func newChunkEncryption() *chunkEncryption { | ||
return &chunkEncryption{ | ||
spanEncryption: encryption.New(0, math.MaxUint32, sha3.NewKeccak256), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of MaxUint32 let's use the branches value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in b737d49
swarm/storage/hasherstore.go
Outdated
|
||
func (h *hasherStore) storeChunk(chunk *Chunk) { | ||
// need to do this parallelly | ||
if h.store != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we checking for h.store
? Shouldn't we just assume it is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one place where we use it with nil, because we only need it for hashing: https://github.com/ethersphere/go-ethereum/blob/swarm-network-rewrite-encryption/cmd/swarm/hash.go#L41
This cmd line tool just gets the hash without storing data. So I think it makes sense to have a nil store if you only need the "hasher" functionality, but not the "store".
5 of my comments got hidden I see. I don't know what that means... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion for dpa Start and Stop functions. If the intension is to reintroduce workers for the dpa, would it be easer to make this methods noop and not remove them from where they are called. Or maybe to even remove the Start func and start the processing loop in the constructor, and keep only the Stop.
swarm/storage/chunker_test.go
Outdated
err = fmt.Errorf("Append error: %v", err) | ||
} | ||
func (t *testChunkStore) Close() { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the return statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dpa.Start() | ||
defer dpa.Stop() | ||
|
||
dpa := NewDPA(localStore, NewDPAParams()) | ||
defer os.RemoveAll("/tmp/bzz") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be defer os.RemoveAll(tdb.dir)
? I know that this is the old code, but this test may be improved here as well.
Instead of using a nil ChunkStore in hasherStore when we don't want the hasherStore to actually store the chunk data, now a FakeChunkStore instance should be used. Because of this the nil checks of the hasherStore.store could be removed.
The branches what is changing, and it is dependent on chunk size and hash size
@nolash false is hardcoded as toEncrypt only temporarily, because this PR does not yet contain any code to wire the new encryption feature of DPA to api.go or server.go. That will happen in a different API |
@janos There is no need to reintroduce workers into DPA, as DPA does not do storage anymore. There are workers in chunker which call hasherStore Put/Get parallelly, and the store itself is asynchronous. So even in hasherStore there is no need for workers in my opinion. |
@nolash A comment is hidden if the code which you commented on has been changed. But I'm checking the hidden comments too... |
It was in pyramid.go, although TreeSplitter is in chunker.go
incrementWorkerCount can be moved into runWorker remove obsolete comment
Depth can only be 0 because of a bug: #344
There was a bug that data was not lengthened to padding
swarm/storage/chunker_test.go
Outdated
|
||
reader := chunker.Join(key, chunkC, 0) | ||
return reader | ||
func newTestHasherStore(chunkStore ChunkStore, hash string) *hasherStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but I think this method is redundant - just use directly the NewHasherStore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provided tests are passing - LGTM. there are many small things we can amend or refactor in the future, but I think this PR is already an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, but the PR is 👍
swarm/storage/hasherstore.go
Outdated
} | ||
|
||
func parseReference(ref Reference) (Key, encryption.Key, error) { | ||
encryptedKeyLength := KeyLength + encryption.KeyLength |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyLength and encryption.KeyLength are constants, encryptedKeyLength can go outside of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
swarm/storage/types.go
Outdated
@@ -206,7 +207,7 @@ func (c *Chunk) WaitToStore() { | |||
func FakeChunk(size int64, count int, chunks []*Chunk) int { | |||
var i int | |||
hasher := MakeHashFunc(SHA3Hash)() | |||
chunksize := getDefaultChunkSize() | |||
chunksize := DefaultChunkSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunksize is not needed, it can be replaced with just DefaultChunkSize in two lines bellow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…entation It is not just the test which needs this kind of implementation
This PR is the integration of the encrpytion package into DPA and Chunker.
The goal was to encapsulate the encryption/decryption into a Putter and Getter interface, which stores and retrieves chunks.
Putter and Getter works with byte slices and not Chunk objects. So it handles the hashing and the Chunk creation. So all this can be removed from chunker, which can just directly use Putter.Put and Getter.Get with byte slices to store and retrieve chunks.
There is a HasherStore implementation of Putter and Getter which decides on whether to decrypt or not base on key length. One HasherStore instance is used for just one chunk storage/retrieval, so it can be initialized whether to encrypt or not when putting.
Another change is that the chunker objects are not reused anymore for multiple parallel split/join/append requests. This made the code much simpler, because the object itself can store the data necessary for the chunking task, no need to pass down a lot of parameters to the function calls.
fixes #270 #280 #281 #289