-
Notifications
You must be signed in to change notification settings - Fork 249
WIP: An attempt delay fetching build's size for MMap until 1st read #1887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: lev-slice-return-faster
Are you sure you want to change the base?
Conversation
packages/orchestrator/internal/sandbox/block/streaming_chunk.go
Outdated
Show resolved
Hide resolved
packages/orchestrator/internal/sandbox/block/streaming_chunk.go
Outdated
Show resolved
Hide resolved
packages/orchestrator/internal/sandbox/block/streaming_chunk.go
Outdated
Show resolved
Hide resolved
| mu sync.Mutex | ||
| chunkOff int64 | ||
| chunkLen int64 | ||
| blockSize int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is suspicious but not changed in this PR. The block size of a chunker for buildId A seems determined by the block size of the first top-level build file (B) that referred to it in its mappings. This seems ok since the block size is used only for tracking isCached, etc.
…m builds When StorageDiff.Init() creates a StreamingChunker for upstream builds, it no longer calls obj.Size() (GCS Attrs / S3 HeadObject). Instead, the object size is discovered for free from the first range-read response (Content-Range header / GCS reader.Attrs.Size) and the mmap-backed cache is created lazily on first fetch. Storage layer changes: - GCS: cache total size from reader.Attrs.Size in ReadAt/OpenRangeReader - AWS: parse Content-Range header in ReadAt/OpenRangeReader - Both: Size() returns cached value first, falls back to metadata call StreamingChunker changes: - Constructor takes sizeFunc instead of pre-resolved size - Cache created on first runFetch via sync.Once - atomic.Pointer[Cache] and atomic.Int64 for size make every concurrent read self-documenting via .Load() — no separate init gate needed - fetchSession uses back-reference to chunker instead of duplicating cache/blockSize fields - isCached fast-path in registerAndWait moved before s.mu.Lock (atomic load + sync.Map are already thread-safe) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ae3827b to
83ab9f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83ab9f567b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| size := c.size.Load() | ||
|
|
||
| for i := int64(0); i < c.size; i += storage.MemoryChunkSize { | ||
| for i := int64(0); i < size; i += storage.MemoryChunkSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize chunker before streaming full contents
With lazy initialization, c.size remains 0 until a range fetch runs, but WriteTo now reads size := c.size.Load() and immediately returns (0, nil) when called first. In streaming mode StorageDiff.Init no longer fetches size up front, so a first operation that exports via WriteTo can silently emit an empty output instead of the object bytes.
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| b, cacheErr := c.cache.Slice(off, length) | ||
| b, cacheErr := c.cache.Load().Slice(off, length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard cache dereference when no fetch was needed
Because cache creation is deferred, c.cache.Load() can still be nil on the first call; if Slice is invoked with length == 0 (including ReadAt with an empty buffer), all fetch goroutines short-circuit with clippedLen <= 0, eg.Wait() succeeds, and this line dereferences a nil cache pointer, causing a panic. Previously the cache always existed, so zero-length reads were safe.
Useful? React with 👍 / 👎.
|
lets add first quick monitoring for the size requests - how much time is spent on the size requests in total (+ prob. count per sandbox) when starting a sandbox to get better idea about the impact |
The idea is that we use the first read to size the cache, rather than explicitly fetching
Size()before. I am not sure that the extra complexities that this PR introduces are worth it.Note that it targets only the new "streaming" chunker, the old one is not changing the logic.