Skip to content

Conversation

@randygrok
Copy link
Contributor

@randygrok randygrok commented Jul 31, 2025

Now is responsability of the node to split the batch that is sent to the DA. And the client only validates if the blob is too big (does not tries to split it)

NOTE: There exists a case when the blob of a single block is too big, it wont be split and will fail after 30 attempts.'

Closes: #2498

Overview

@github-actions
Copy link
Contributor

github-actions bot commented Jul 31, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedAug 8, 2025, 9:57 AM

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @randygrok, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces robust validation and automatic batch splitting for Data Availability (DA) layer submissions. The primary goal is to enhance the reliability and resilience of the DA submission process by validating blob sizes upfront and, if necessary, intelligently splitting large batches into smaller, manageable chunks for retry, preventing submission failures due to oversized batches.

Highlights

  • DA Batch Splitting: Implemented automatic splitting of Data Availability (DA) submission batches when a StatusTooBig error is encountered. If a batch is too large, it's now split in half and retried, preventing infinite retry loops for oversized submissions. If a single item is too large, it will now apply exponential backoff.
  • Comprehensive DA Size Validation: The DA client's SubmitWithOptions function now performs strict validation of blob sizes. It checks if individual blobs or the total batch size exceed the MaxBlobSize limit before attempting submission, returning an error (da.ErrBlobSizeOverLimit) if limits are breached. This replaces previous behavior that could lead to partial submissions or silent dropping of oversized blobs.
  • Improved Test Coverage: Added new unit tests to verify the batch splitting logic in block/submitter.go and the comprehensive size validation in da/jsonrpc/client.go, ensuring the new behaviors are correctly implemented and robust.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two key features: robust batch size validation at the DA client level and automatic batch splitting in the submitter when a batch is too large. The change in da/jsonrpc/client.go is a great improvement, preventing silent dropping of blobs and ensuring that oversized batches are rejected entirely, which enhances correctness. The new tests in block/submitter_test.go and da/jsonrpc/client_test.go provide good coverage for these new behaviors.

My review includes two main suggestions for improvement:

  1. Refactoring the batch splitting logic in block/submitter.go to be more efficient and reduce submission latency.
  2. Improving the testability of the retry logic in block/submitter.go by making the number of attempts configurable, which would allow a currently skipped test to be enabled.

Overall, this is a solid contribution that improves the robustness of the DA submission process.

@codecov
Copy link

codecov bot commented Jul 31, 2025

Codecov Report

❌ Patch coverage is 81.84713% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.45%. Comparing base (99dc6b7) to head (ce2b81e).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
block/submitter.go 81.18% 49 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2506      +/-   ##
==========================================
- Coverage   74.46%   74.45%   -0.01%     
==========================================
  Files          70       70              
  Lines        6537     6711     +174     
==========================================
+ Hits         4868     4997     +129     
- Misses       1279     1317      +38     
- Partials      390      397       +7     
Flag Coverage Δ
combined 74.45% <81.84%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@claude
Copy link
Contributor

claude bot commented Jul 31, 2025

Claude finished @randygrok's task —— View job


Code Review: DA Batch Size Validation and Automatic Splitting

Todo List

  • Read repository guidelines and understand codebase
  • Analyze core implementation in block/submitter.go
  • Review DA client validation in da/jsonrpc/client.go
  • Examine test coverage and quality
  • Check for potential security and performance issues
  • Provide comprehensive review feedback

Summary

This PR successfully implements robust DA batch size validation and automatic batch splitting. The implementation addresses the core issue where oversized batches would cause infinite retry loops by introducing intelligent batch splitting at the node level and comprehensive validation at the DA client level.

Code Quality Analysis

Strengths

  1. Robust Architecture: The recursive batch splitting implementation in block/submitter.go:490-594 is well-designed with proper base cases and error handling.

  2. Comprehensive Validation: da/jsonrpc/client.go:145-181 implements thorough upfront validation that prevents invalid submissions before they reach the DA layer.

  3. Strong Test Coverage: Test coverage is excellent with specific tests for batch splitting (TestSubmitToDA_ItChunksBatchWhenSizeExceedsLimit) and size validation (TestSubmitWithOptions_SizeValidation).

  4. Clean Error Handling: The submissionOutcome pattern provides clear separation of concerns for different submission results.

  5. Performance-Conscious: The recursive splitting approach minimizes unnecessary retries while ensuring all valid data is submitted.

🔍 Areas for Improvement

1. Missing Documentation (Medium Priority)

Several new functions lack proper documentation:

  • submitWithRecursiveSplitting (block/submitter.go:490) needs a detailed docstring explaining the recursive algorithm
  • submitHalfBatch (block/submitter.go:543) should document its role in the splitting strategy
  • processBatch (block/submitter.go:620) needs clearer documentation of its return actions

2. Configuration Flexibility (Low Priority)

The maxSubmitAttempts constant (set to 30) is hardcoded. Consider making this configurable per environment:

// In config
type DAConfig struct {
    MaxSubmitAttempts int `mapstructure:"max_submit_attempts"`
    // ...
}

3. Edge Case Handling (Medium Priority)

Single Item Too Large: When a single item exceeds DA blob size limits (block/submitter.go:417-418), it will retry 30 times with exponential backoff before failing. Consider:

  • Adding immediate validation at block creation time to prevent oversized blocks
  • More prominent logging for this critical error condition
  • Consider if this should cause node panic vs. graceful degradation

4. Memory Efficiency (Low Priority)

The recursive splitting creates multiple copies of slice data. For very large batches, consider using slice indices instead of creating new slices to reduce memory pressure.

Security Considerations

Secure Implementation

  • No sensitive data exposure in logs
  • Proper error wrapping maintains context without leaking internals
  • Input validation prevents malformed submissions
  • Context cancellation properly handled throughout

🔒 Security Best Practices Followed

  • Blob size validation prevents resource exhaustion attacks
  • Recursive splitting has proper bounds checking to prevent infinite recursion
  • Gas price management prevents economic exploits

Performance Analysis

Performance Benefits

  1. Reduced Network Overhead: Upfront validation prevents unnecessary DA submissions
  2. Optimized Retry Logic: Recursive splitting reduces retry attempts vs. linear backoff
  3. Batch Processing: Efficient handling of partial submissions

Performance Considerations

  1. Memory Usage: Recursive splitting may use more memory for large batches but improves submission success rates
  2. Latency: Multiple smaller submissions may increase total latency but prevent complete failures

Test Coverage Assessment

Excellent Coverage

  • Unit Tests: Comprehensive test cases for all splitting scenarios
  • Integration Scenarios: Tests cover full batch → partial submission → recursive splitting
  • Edge Cases: Single item too large scenario properly tested
  • Validation Logic: Client-side validation thoroughly tested with various blob size combinations

📊 Coverage Metrics

Current patch coverage is 81.84% which is solid. Missing coverage likely relates to error paths and edge cases in the recursive splitting logic.

Final Recommendations

Must Address Before Merge

  1. Add Documentation: Document the new functions, especially the recursive splitting algorithm
  2. Resolve Merge Conflicts: Address the merge conflicts noted by reviewers

Consider for Follow-up

  1. Performance Monitoring: Add metrics for batch splitting frequency to monitor DA layer efficiency
  2. Configuration Enhancement: Make retry limits configurable
  3. Proactive Validation: Consider adding block size validation at creation time to prevent oversized single items

Changelog Entry

As requested by @tac0turtle, this PR needs a changelog entry documenting the new batch splitting and validation features.

Conclusion

This is a high-quality implementation that significantly improves the robustness of DA layer interactions. The recursive batch splitting prevents infinite retry loops while ensuring maximum data availability. The comprehensive validation prevents invalid submissions upfront. The code follows Go best practices and includes excellent test coverage.

Recommendation: Approve after addressing documentation and merge conflicts.


case coreda.StatusSuccess:
// Record successful DA submission
m.recordDAMetrics("submission", DAModeSuccess)
case coreda.StatusNotIncludedInBlock, coreda.StatusAlreadyInMempool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we retry in this case from a call higher up?

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few minor comments, overall flow makes sense

@tac0turtle
Copy link
Contributor

also we need a changelog here

Copy link
Member

@Manav-Aggarwal Manav-Aggarwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left a bunch of comments, think fixing the e2e test is the main thing here.

Comment on lines +187 to +191
includedCount := 0
for _, txHashStr := range pendingTxHashes {
if evm.CheckTxIncluded(t, common.HexToHash(txHashStr)) {
includedCount++
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this has anything to do with DA submission whatsover. We should likely look at the DA inclusion block height instead here. Keep track of the block height before (should be way ahead of the DA included block height) we start up the DA layer again, and then make sure that block height is at least a DA included block height after the DA layer starts up.

Copy link
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start!
Some minor comments on logging but as I understand the code, there is risk of data loss in the batchActionSkip scenario. Can you check?

select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(backoff):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: timer Go routine leak when ctx.Done() . Just move outside the select:

timer := time.NewTimer(backoff)
defer timer.Stop()

batches = append(result.splitBatches, batches...)
case batchActionSkip:
// Single item too big - skip it
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correct, the batch item would be ignored. Should we error or panic instead?

tac0turtle
tac0turtle previously approved these changes Aug 7, 2025
Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK, there are a few comments from alex left over but otherwise nice job

@tac0turtle tac0turtle enabled auto-merge August 7, 2025 15:20
@tac0turtle tac0turtle requested a review from alpe August 7, 2025 16:29
alpe
alpe previously approved these changes Aug 8, 2025
Copy link
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I give an utAck, too. Returning an error makes it easier to read. Thank you for addressing my comment.
Please address the merge conflicts before merging.

@randygrok randygrok requested review from alpe and tac0turtle August 8, 2025 09:53
@tac0turtle tac0turtle added this pull request to the merge queue Aug 8, 2025
Merged via the queue into main with commit a53a7ac Aug 8, 2025
28 of 31 checks passed
@tac0turtle tac0turtle deleted the feat/fix-node-restart branch August 8, 2025 10:56
@github-project-automation github-project-automation bot moved this to Done in Evolve Aug 8, 2025
@tac0turtle tac0turtle removed this from Evolve Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Shutting down node and then restarting later on could cause large blobs

5 participants