Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

Integrate encryption into dpa and chunker #327

Merged
merged 30 commits into from
Mar 30, 2018

Conversation

gbalint
Copy link

@gbalint gbalint commented Mar 14, 2018

This PR is the integration of the encrpytion package into DPA and Chunker.

The goal was to encapsulate the encryption/decryption into a Putter and Getter interface, which stores and retrieves chunks.
Putter and Getter works with byte slices and not Chunk objects. So it handles the hashing and the Chunk creation. So all this can be removed from chunker, which can just directly use Putter.Put and Getter.Get with byte slices to store and retrieve chunks.
There is a HasherStore implementation of Putter and Getter which decides on whether to decrypt or not base on key length. One HasherStore instance is used for just one chunk storage/retrieval, so it can be initialized whether to encrypt or not when putting.
Another change is that the chunker objects are not reused anymore for multiple parallel split/join/append requests. This made the code much simpler, because the object itself can store the data necessary for the chunking task, no need to pass down a lot of parameters to the function calls.

fixes #270 #280 #281 #289

self.running = false
close(self.quitC)
}
// func (self *DPA) Start() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in db1b51a

@@ -195,6 +189,9 @@ func (self *PyramidChunker) Split(data io.Reader, size int64, chunkC chan *Chunk
}()

defer close(quitC)
defer func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for func() {}() see line above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f32d4cc

@@ -203,28 +200,25 @@ func (self *PyramidChunker) Split(data io.Reader, size int64, chunkC chan *Chunk
}
case <-time.NewTimer(splitTimeout).C:
}
return rootKey, storageWG.Wait, nil
return rootKey, func() { return }, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for return

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was before wait was implemented hasherstore. Proper wait implementation is added in f32d4cc

key: key,
chunkC: chunkC,
key: key,
// chunkC: chunkC,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f32d4cc

key: key,
chunkC: chunkC,
key: key,
// chunkC: chunkC,
chunkSize: self.chunkSize,
branches: self.branches,
hashSize: self.hashSize,
depth: depth,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on depth should be added (TODO)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 582c6e5

}
}

func NewHasherStore(chunkStore ChunkStore, hashFunc SwarmHasher, toEncrypt bool) *hasherStore {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment exported functions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8016f18


return Reference(append(chunk.Key, encryptionKey...)), nil

//TODO: implement wait for storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this and added proper wait implementation in f32d4cc

@@ -494,8 +483,8 @@ func (self *PyramidChunker) prepareChunks(isAppend bool, chunkLevel [][]*TreeEnt
workers := self.getWorkerCount()
if int64(len(jobC)) > workers && workers < ChunkProcessors {
self.incrementWorkerCount()
storageWG.Add(1)
go self.processor(self.workerCount, jobC, chunkC, errC, quitC, storageWG)
// storageWG.Add(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f32d4cc

This is not a fully functional change.
Chunker api changed: it uses Putter and Getter interfaces to put and get
chunks from the store and to do the encryption/decryption if needed. These
putters and getters should also the hashing, so hashing related code
could be removed from chunker.
Some things are temporary, pyramid chunker doesn't work yet (only tree
chunker), and hashing and putting in dpa is not parallelized.
PyramydChunker doesn't work yet with Putter/Getter
creation

This fixes the tests with pyramid chunker
Chunker code is much simpler if all data of the job is stored on the
chunker object itself.
@gbalint gbalint force-pushed the swarm-network-rewrite-encryption branch from cc05f60 to f32d4cc Compare March 26, 2018 08:37
@gbalint gbalint changed the title [WIP] Integrate encryption into dpa and chunker Integrate encryption into dpa and chunker Mar 26, 2018
@gbalint gbalint requested a review from nagydani March 26, 2018 12:54
@@ -263,7 +174,7 @@ func TestSha3ForCorrectness(t *testing.T) {

}

func TestDataAppend(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbalint why is this commented out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nonsense accident, I reenabled it in d8ee6a1

@@ -0,0 +1,223 @@
// Copyright 2016 The go-ethereum Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump up copyright to 2018.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nonsense Fixed in 46e9832

return Reference(append(chunk.Key, encryptionKey...)), nil
}

// Returns data of the chunk with the given reference (retrieved from the ChunkStore of hasherStore).
Copy link
Contributor

@nonsense nonsense Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but godoc comments should start with the name of the of the symbol described. Applies to multiple places. https://blog.golang.org/godoc-documenting-go-code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nonsense Sure, it is not a nitpick, I'm happy to receive these kind of review comments. It is fixed in 46e9832

return chunkData, nil
}

// The Close() indicates that no more chunks will be put with the hasherStore, so the Wait
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close indicates...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2329cfb

// NewHasherStore creates a hasherStore object, which implements Putter and Getter interfaces.
// With the HasherStore you can put and get chunk data (which is just []byte) into a ChunkStore
// and the hasherStore will take core of encryption/decryption of data if necessary
func NewHasherStore(chunkStore ChunkStore, hashFunc SwarmHasher, toEncrypt bool) *hasherStore {
Copy link
Contributor

@nonsense nonsense Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think HasherStore is a good candidate for a few unit tests. It is a new type, and having a bunch of simple unit tests would help documenting it and reduce the cognitive load on readers, since a few simple examples of actual use of the type speak much more than any comments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree, I'm adding a few

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added in cdc7bed

Copy link
Contributor

@nonsense nonsense Mar 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. I would however suggest we do this with table-driven tests in order to have idiomatic go code. Additional benefit is:

  1. Less overhead of having to think about multiple methods.
  2. Input params to tests are named.

https://gist.github.com/nonsense/87ff83e0a76f3b75372f5f3c52dbc439

What do you think?

There is only a slight difference in this test, as both are quite simple, but if we have more parameters, naming them makes it much easier to reason about the test.

https://github.com/golang/go/wiki/TableDrivenTests

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nonsense Nice, I committed it in ba258d7


func newChunkEncryption() *chunkEncryption {
return &chunkEncryption{
spanEncryption: encryption.New(0, math.MaxUint32, sha3.NewKeccak256),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of MaxUint32 let's use the branches value

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b737d49


func (h *hasherStore) storeChunk(chunk *Chunk) {
// need to do this parallelly
if h.store != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we checking for h.store? Shouldn't we just assume it is there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one place where we use it with nil, because we only need it for hashing: https://github.com/ethersphere/go-ethereum/blob/swarm-network-rewrite-encryption/cmd/swarm/hash.go#L41
This cmd line tool just gets the hash without storing data. So I think it makes sense to have a nil store if you only need the "hasher" functionality, but not the "store".

@nolash
Copy link
Contributor

nolash commented Mar 26, 2018

5 of my comments got hidden I see. I don't know what that means...

Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion for dpa Start and Stop functions. If the intension is to reintroduce workers for the dpa, would it be easer to make this methods noop and not remove them from where they are called. Or maybe to even remove the Start func and start the processing loop in the constructor, and keep only the Stop.

err = fmt.Errorf("Append error: %v", err)
}
func (t *testChunkStore) Close() {
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the return statement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janos Fixed in 49a4d9c

dpa.Start()
defer dpa.Stop()

dpa := NewDPA(localStore, NewDPAParams())
defer os.RemoveAll("/tmp/bzz")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be defer os.RemoveAll(tdb.dir)? I know that this is the old code, but this test may be improved here as well.

Instead of using a nil ChunkStore in hasherStore when we don't want the
hasherStore to actually store the chunk data, now a FakeChunkStore
instance should be used. Because of this the nil checks of the hasherStore.store
could be removed.
The branches what is changing, and it is dependent on chunk size and
hash size
@gbalint
Copy link
Author

gbalint commented Mar 28, 2018

@nolash false is hardcoded as toEncrypt only temporarily, because this PR does not yet contain any code to wire the new encryption feature of DPA to api.go or server.go. That will happen in a different API

@gbalint
Copy link
Author

gbalint commented Mar 28, 2018

@janos There is no need to reintroduce workers into DPA, as DPA does not do storage anymore. There are workers in chunker which call hasherStore Put/Get parallelly, and the store itself is asynchronous. So even in hasherStore there is no need for workers in my opinion.

@gbalint
Copy link
Author

gbalint commented Mar 28, 2018

@nolash A comment is hidden if the code which you commented on has been changed. But I'm checking the hidden comments too...


reader := chunker.Join(key, chunkC, 0)
return reader
func newTestHasherStore(chunkStore ChunkStore, hash string) *hasherStore {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but I think this method is redundant - just use directly the NewHasherStore.

Copy link
Contributor

@nonsense nonsense left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provided tests are passing - LGTM. there are many small things we can amend or refactor in the future, but I think this PR is already an improvement.

Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but the PR is 👍

}

func parseReference(ref Reference) (Key, encryption.Key, error) {
encryptedKeyLength := KeyLength + encryption.KeyLength
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyLength and encryption.KeyLength are constants, encryptedKeyLength can go outside of the function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janos I decided to make it a parameter as KeyLength is not constant, but the hashSize, which depends on the hash function, see 0bfe320

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -206,7 +207,7 @@ func (c *Chunk) WaitToStore() {
func FakeChunk(size int64, count int, chunks []*Chunk) int {
var i int
hasher := MakeHashFunc(SHA3Hash)()
chunksize := getDefaultChunkSize()
chunksize := DefaultChunkSize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunksize is not needed, it can be replaced with just DefaultChunkSize in two lines bellow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janos Fixed in 0bfe320

@gbalint gbalint merged commit 167159b into swarm-network-rewrite Mar 30, 2018
@gbalint gbalint deleted the swarm-network-rewrite-encryption branch March 30, 2018 10:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants