Chunk store #848

cody-littley · 2024-10-30T20:20:10Z

Why are these changes needed?

This PR adds a framework that will be used by encoders to upload data.

As requested by @dmanc, I split apart the logic for uploading proofs and uploading coefficients into separate methods.

Since this functionality is needed to unblock @dmanc, I'm pushing it in a partially completed form. Namely, the framework does not currently break large files into smaller ones when pushing to S3. I plan on adding that as a follow up task.

Checks

I've made sure the lint is passing in this PR.
I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
I've checked the new test coverage and the coverage percentage didn't drop.
Testing Strategy
- Unit tests
- Integration tests
- This PR is not tested :(

Signed-off-by: Cody Littley <cody@eigenlabs.org>

disperser/common/chunkstore/chunk_metadata_store.go

ian-shim · 2024-10-30T20:31:21Z

disperser/common/chunkstore/chunk_writer.go

@@ -0,0 +1,103 @@
+package chunkstore


I feel like chunkstore belongs outside disperser. Should it be under relay?

Moved it to the relay.

I was actually a little confused about where this should live. The relay reads from it, but the encoder will be writing to it. But I guess relay is as good as anywhere (unless we decide to make it a top level directory, which I don't think it deserves to be).

ian-shim · 2024-10-30T20:34:43Z

encoding/rs/frame.go

@@ -33,3 +37,61 @@ func Decode(b []byte) (Frame, error) {
 	}
 	return f, nil
 }
+
+// EncodeFrames serializes a slice of frames into a byte slice.
+func EncodeFrames(frames []*Frame) ([]byte, error) {


Why don't we use existing serialization methods like this?

From split encoder perspective we want to serialize the proofs and the coefficients separately. From node perspective we could make sure it receives the chunks in the expected serialized format.

@dmanc requested that the chunk store had the capability of uploading the encoding.Proof objects and rs.Frame objects separately. The code you point to is capable of serializing both at the same time, but does provide a way to serialize/deserialize them separately.

I see. How about this one?

ian-shim · 2024-10-30T20:35:54Z

disperser/common/chunkstore/chunk_reader.go

+func (r *chunkReader) GetChunkCoefficients(
+	ctx context.Context,
+	blobKey disperser.BlobKey,
+	metadata *ChunkCoefficientMetadata) ([]*rs.Frame, error) {


Does chunk reader need chunkMetadataStore if this method takes metadata as input?

Very good point, it won't need this to be passed in. I didn't notice because it's not actually used in this PR's iteration of the feature. Removed.

ian-shim · 2024-10-30T20:38:02Z

disperser/common/chunkstore/chunk_metadata_store.go

+)
+
+// ChunkMetadataStore is an interface for storing and retrieving metadata about chunks.
+type ChunkMetadataStore interface {


Not sure if we need another store abstraction for writing/reading chunk metadata.
Since chunk metadata lives inside blob metadata, I think write/read should happen via blob metadata store.

This was something separate since we had originally discussed not putting the extra metadata into the regular blob metadata store. Now that this data has merged into the other blob matadata, I agree that it doesn't make sense to have a separate chunk metadata store. Removed.

anupsv · 2024-10-31T01:36:36Z

disperser/common/chunkstore/chunk_writer.go

+	// The total size of file containing all chunk coefficients for the blob.
+	DataSize int
+	// The maximum fragment size used to store the chunk coefficients.
+	FragmentSize int


Could we be more specific here and say its uint64 or what's appropriate than generic int type ?

Will do. Let's go with uint64 for the sake of future compatibility (I can't imagine having >4gb files, but let's not limit ourselves here... it's not that much overhead).

As a side note, one of the quirks of golang that drives me up a wall is how they strongly encourage everybody to use the int type everywhere. For example, why does len(x) return a signed value? If I were in charge of the language design, I'd never have supported the types int and uint in the first place. /rant

anupsv · 2024-10-31T01:42:56Z

disperser/common/chunkstore/chunk_writer.go

+		bytes = append(bytes, proofBytes[:]...)
+	}
+
+	err := c.s3Client.UploadObject(ctx, c.bucketName, s3Key, bytes)


What is our s3 object versioning policy ? would we need to care for objects that already exist before uploading ?

Discussed offline, repeating conclusion of that talk here for others.

Since each key is unique and has a deterministic value, writing a value to a key more than once is harmless (i.e. the data is overwritten with the exact same data).

Signed-off-by: Cody Littley <cody@eigenlabs.org>

ian-shim

lgtm! one comment re: serialization. Can you also take a look at the lint failure?

ian-shim · 2024-10-31T20:29:11Z

encoding/rs/frame.go

@@ -33,3 +37,61 @@ func Decode(b []byte) (Frame, error) {
 	}
 	return f, nil
 }
+
+// EncodeFrames serializes a slice of frames into a byte slice.
+func EncodeFrames(frames []*Frame) ([]byte, error) {


I see. How about this one?

Signed-off-by: Cody Littley <cody@eigenlabs.org>

dmanc

Looks good, left a couple small comments

dmanc · 2024-11-04T21:36:40Z

encoding/rs/frame.go

+// GnarkEncodeFrames serializes a slice of frames into a byte slice.
+func GnarkEncodeFrames(frames []*Frame) ([]byte, error) {
+
+	// Serialization format:


Should we move this to above the function so it shows up in the go docs?

Done.

// GnarkEncodeFrames serializes a slice of frames into a byte slice. // // Serialization format: // [number of frames: 4 byte uint32] // [size of frame 1: 4 byte uint32][frame 1] // [size of frame 2: 4 byte uint32][frame 2] // ... // [size of frame n: 4 byte uint32][frame n] // // Where relevant, big endian encoding is used. func GnarkEncodeFrames(frames []*Frame) ([]byte, error) {

dmanc · 2024-11-04T21:40:13Z

encoding/rs/frame.go

+		return nil, 0, fmt.Errorf("invalid frame size: %d", len(serializedFrame))
+	}
+
+	coeffs := make([]encoding.Symbol, frameCount)


nit: I feel like encoding.Symbol is not really used anywhere. Maybe it's worth deprecating it and just using fr.Element.

// Symbol is a symbol in the field used for polynomial commitments type Symbol = fr.Element

Signed-off-by: Cody Littley <cody@eigenlabs.org>

dmanc · 2024-11-05T05:18:20Z

relay/chunkstore/chunk_reader.go

+
+func (r *chunkReader) GetChunkProofs(
+	ctx context.Context,
+	blobKey disperser.BlobKey) ([]*encoding.Proof, error) {


Seems like this blobkey references the v1 blob key. In StoreBlob for V2 we use blobKey.Hex() = string

eigenda/disperser/common/v2/blobstore/s3_blob_store.go

Line 27 in c63dd61

func (b *BlobStore) StoreBlob(ctx context.Context, blobKey string, data []byte) error {

.

V2 blob key:

type BlobKey [32]byte func (b BlobKey) Hex() string { return hex.EncodeToString(b[:]) }

Also when we fetch for proofs vs coefficients don't we need a different S3 key to differentiate it?

I've been assuming we'd use different buckets. Started a slack conversation to discuss. Will circle back on this prior to merging once we decide how we want to handle buckets and namespacing.

I've switched over to using v2.BlobKey as recommended by @ian-shim.

dmanc · 2024-11-05T05:36:13Z

relay/chunkstore/chunk_writer.go

+
+// ChunkCoefficientMetadata contains metadata about how chunk coefficients are stored.
+// Required for reading chunk coefficients using ChunkReader.GetChunkCoefficients().
+type ChunkCoefficientMetadata struct {


Are these the same? encoding.FragmentInfo

type FragmentInfo struct { TotalChunkSizeBytes uint32 NumFragments uint32 }

They aren't currently the same, but should be. Now fixed.

The primary reason why I didn't originally enable fragmented read/write operations was because I wasn't initially sure how the metdata store would handle this data. Now that Ian merged his PR, I've unified ChunkCoefficientMetadata with FragmentInfo and have enabled chunk file fragmentation.

Signed-off-by: Cody Littley <cody@eigenlabs.org>

cody-littley added 4 commits October 30, 2024 11:42

Added chunk store.

a804cbb

Signed-off-by: Cody Littley <cody@eigenlabs.org>

Merge branch 'master' into chunk-store

92feb04

Signed-off-by: Cody Littley <cody@eigenlabs.org>

Finish test for uploading/downloading proofs

8cab35e

Signed-off-by: Cody Littley <cody@eigenlabs.org>

Add ability to upload/download coefficients.

24cb43e

Signed-off-by: Cody Littley <cody@eigenlabs.org>

cody-littley requested review from dmanc, jianoaix and ian-shim October 30, 2024 20:20

cody-littley self-assigned this Oct 30, 2024

cody-littley commented Oct 30, 2024

View reviewed changes

disperser/common/chunkstore/chunk_metadata_store.go Outdated Show resolved Hide resolved

ian-shim reviewed Oct 30, 2024

View reviewed changes

anupsv reviewed Oct 31, 2024

View reviewed changes

cody-littley added 2 commits October 31, 2024 10:50

Made suggested changes.

5bf0a48

Signed-off-by: Cody Littley <cody@eigenlabs.org>

Merge branch 'master' into chunk-store

76628c1

Signed-off-by: Cody Littley <cody@eigenlabs.org>

ian-shim reviewed Oct 31, 2024

View reviewed changes

cody-littley marked this pull request as draft November 1, 2024 13:56

cody-littley added 3 commits November 4, 2024 07:56

Merge branch 'master' into chunk-store

728a7fb

Signed-off-by: Cody Littley <cody@eigenlabs.org>

Incremental progress.

d3e2de9

Signed-off-by: Cody Littley <cody@eigenlabs.org>

Fix unit test.

c4f6b19

Signed-off-by: Cody Littley <cody@eigenlabs.org>

cody-littley marked this pull request as ready for review November 4, 2024 17:01

dmanc approved these changes Nov 4, 2024

View reviewed changes

Make suggested changes.

7af8882

Signed-off-by: Cody Littley <cody@eigenlabs.org>

ian-shim approved these changes Nov 4, 2024

View reviewed changes

dmanc reviewed Nov 5, 2024

View reviewed changes

cody-littley added 3 commits November 5, 2024 09:54

Merge branch 'master' into chunk-store

3fb0c1d

Enable fragmented upload/download of chunks.

dd6cde1

Signed-off-by: Cody Littley <cody@eigenlabs.org>

Use correct type of blob key.

9b605f8

Signed-off-by: Cody Littley <cody@eigenlabs.org>

cody-littley merged commit 5fd9a08 into Layr-Labs:master Nov 5, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk store #848

Chunk store #848

cody-littley commented Oct 30, 2024

ian-shim Oct 30, 2024

cody-littley Oct 31, 2024

ian-shim Oct 30, 2024

dmanc Oct 30, 2024

cody-littley Oct 31, 2024

ian-shim Oct 31, 2024

ian-shim Oct 30, 2024

cody-littley Oct 31, 2024

ian-shim Oct 30, 2024

cody-littley Oct 31, 2024

anupsv Oct 31, 2024

cody-littley Oct 31, 2024 •

edited

Loading

anupsv Oct 31, 2024

cody-littley Oct 31, 2024

ian-shim left a comment

ian-shim Oct 31, 2024

dmanc left a comment

dmanc Nov 4, 2024

cody-littley Nov 4, 2024

dmanc Nov 4, 2024

cody-littley Nov 4, 2024

dmanc Nov 5, 2024

dmanc Nov 5, 2024

cody-littley Nov 5, 2024 •

edited

Loading

cody-littley Nov 5, 2024

dmanc Nov 5, 2024

cody-littley Nov 5, 2024 •

edited

Loading

Chunk store #848

Chunk store #848

Conversation

cody-littley commented Oct 30, 2024

Why are these changes needed?

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cody-littley Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ian-shim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmanc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cody-littley Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cody-littley Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

cody-littley Oct 31, 2024 •

edited

Loading

cody-littley Nov 5, 2024 •

edited

Loading

cody-littley Nov 5, 2024 •

edited

Loading