Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix : fix compute hash #9

Merged
merged 4 commits into from
Mar 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,32 @@ Support common libs for different repos of greenfield

(1) erasure package support RSEncoder which contain basic Encode and Decode reedSolomon APIs
```
RSEncoderStorage, err := NewRSEncoder(dataShards, parityShards, int64(blockSize))
// first step, create a new rs encoder, the blockSize indicate the data size to be encoded
func NewRSEncoder(dataShards, parityShards int, blockSize int64) (r RSEncoder, err error) {
// encode data and return the encoded shard number
func (r *RSEncoder) EncodeData(content []byte) ([][]byte, error)
// decodes the input erasure encoded data shards data.
func (r *RSEncoder) DecodeDataShards(content [][]byte) error {
// decode the input data and reconstruct the data shards data (not include the parity shards).
func (r *RSEncoder) DecodeDataShards(content [][]byte) error
// decode the input data and reconstruct the data shards and parity Shards
func (r *RSEncoder) DecodeShards(data [][]byte) error
```
(2) redundancy package support methods to encode/decode segments data using RSEncoder
```
// encode segment
// encode one segment
func EncodeRawSegment(content []byte, dataShards, parityShards int) ([][]byte, error)

// decode segment
// decode the segment and reconstruct the original segment content
func DecodeRawSegment(pieceData [][]byte, segmentSize int64, dataShards, parityShards int) ([]byte, error)
Copy link
Collaborator

@will-2012 will-2012 Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.What's the difference between segmentSize and blockSize?
2.piceceData->shardDataList maybe a good choice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No difference, These two interfaces are applied on the SP side as the segments is called in gnfd side, so the naming is closer to the application layer

```

### 2. Compute sha256 hash of file content

hash package support methods to compute hash roots of greenfield objects , the computed methods is based on
redundancy Strategy of greenfield
redundancy strategy of greenfield

```
// compute hash roots fromm io reader, the parameters should fetch from chain besides reader
func ComputerHash(reader io.Reader, segmentSize int64, dataShards, parityShards int) ([]string, int64, error)
func ComputeIntegrityHash(reader io.Reader, segmentSize int64, dataShards, parityShards int) ([]string, int64, error)

// compute hash roots based on file path
func ComputerHashFromFile(filePath string, segmentSize int64, dataShards, parityShards int) ([]string, int64, error)
Expand Down
26 changes: 11 additions & 15 deletions go/hash/hash.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,18 @@ import (
"github.com/rs/zerolog/log"
)

// ComputerHash split the reader into segment, ec encode the data, compute the hash roots of pieces
// ComputeIntegrityHash split the reader into segment, ec encode the data, compute the hash roots of pieces
// return the hash result array list and data size
func ComputerHash(reader io.Reader, segmentSize int64, dataShards, parityShards int) ([][]byte, int64, error) {
func ComputeIntegrityHash(reader io.Reader, segmentSize int64, dataShards, parityShards int) ([][]byte, int64, error) {
var segChecksumList [][]byte
var result [][]byte
ecShards := dataShards + parityShards
encodeData := make([][][]byte, ecShards)

encodeData := make([][][]byte, ecShards)
for i := 0; i < ecShards; i++ {
encodeData[i] = make([][]byte, 0)
}

hashList := make([][]byte, ecShards+1)
contentLen := int64(0)
// read the data by segment size
for {
Expand All @@ -34,7 +34,8 @@ func ComputerHash(reader io.Reader, segmentSize int64, dataShards, parityShards
}
break
}
if n > 0 {

if n > 0 && n <= int(segmentSize) {
contentLen += int64(n)
data := seg[:n]
// compute segment hash
Expand All @@ -53,14 +54,12 @@ func ComputerHash(reader io.Reader, segmentSize int64, dataShards, parityShards
}

// combine the hash root of pieces of the PrimarySP
segmentRootHash := GenerateIntegrityHash(segChecksumList)
result = append(result, segmentRootHash)
hashList[0] = GenerateIntegrityHash(segChecksumList)

// compute the hash root of pieces of the SecondarySP
wg := &sync.WaitGroup{}
spLen := len(encodeData)
wg.Add(spLen)
hashList := make([][]byte, spLen)
for spID, content := range encodeData {
go func(data [][]byte, id int) {
defer wg.Done()
Expand All @@ -70,16 +69,13 @@ func ComputerHash(reader io.Reader, segmentSize int64, dataShards, parityShards
checksumList = append(checksumList, piecesHash)
}

hashList[id] = GenerateIntegrityHash(checksumList)
hashList[id+1] = GenerateIntegrityHash(checksumList)
}(content, spID)
}

wg.Wait()

for i := 0; i < spLen; i++ {
result = append(result, hashList[i])
}
return result, contentLen, nil
return hashList, contentLen, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why result = append(result, hashList[i]) ?directly return hashList?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, some adjustments have been made, the result variable is no longer needed


// ComputerHashFromFile open a local file and compute hash result
Expand All @@ -92,11 +88,11 @@ func ComputerHashFromFile(filePath string, segmentSize int64, dataShards, parity
}
defer f.Close()

return ComputerHash(f, segmentSize, dataShards, parityShards)
return ComputeIntegrityHash(f, segmentSize, dataShards, parityShards)
}

// ComputerHashFromBuffer support compute hash from byte buffer
func ComputerHashFromBuffer(content []byte, segmentSize int64, dataShards, parityShards int) ([][]byte, int64, error) {
reader := bytes.NewReader(content)
return ComputerHash(reader, segmentSize, dataShards, parityShards)
return ComputeIntegrityHash(reader, segmentSize, dataShards, parityShards)
}
7 changes: 4 additions & 3 deletions go/hash/hash_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,16 @@ import (
)

const (
segmentSize = 16 * 1024 * 1024
segmentSize = 16 * 1024 * 1024
expectedHashBytesLen = 32
)

func TestHash(t *testing.T) {
length := int64(32 * 1024 * 1024)
contentToHash := createTestData(length)
start := time.Now()

hashResult, size, err := ComputerHash(contentToHash, int64(segmentSize), redundancy.DataBlocks, redundancy.ParityBlocks)
hashResult, size, err := ComputeIntegrityHash(contentToHash, int64(segmentSize), redundancy.DataBlocks, redundancy.ParityBlocks)
if err != nil {
t.Errorf(err.Error())
}
Expand Down Expand Up @@ -61,7 +62,7 @@ func TestHashResult(t *testing.T) {
for i := 0; i < 1024*1024; i++ {
buffer.WriteString(fmt.Sprintf("[%05d] %s\n", i, line))
}
hashList, _, err := ComputerHash(bytes.NewReader(buffer.Bytes()), int64(segmentSize), redundancy.DataBlocks, redundancy.ParityBlocks)
hashList, _, err := ComputeIntegrityHash(bytes.NewReader(buffer.Bytes()), int64(segmentSize), redundancy.DataBlocks, redundancy.ParityBlocks)
if err != nil {
t.Errorf(err.Error())
}
Expand Down