Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset dag chain #4227

Open
wants to merge 14 commits into
base: dag-master
Choose a base branch
from
Open

Reset dag chain #4227

wants to merge 14 commits into from

Conversation

jackzhhuang
Copy link
Collaborator

@jackzhhuang jackzhhuang commented Oct 9, 2024

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

Issue Number: N/A

What is the new behavior?

Other information

Summary by CodeRabbit

  • New Features

    • Introduced a method for saving the DAG state directly in the BlockDAG structure.
    • Enhanced block deletion precision by utilizing a reachability service in the reset method.
  • Improvements

    • Made the reachability service in the GhostdagManager publicly accessible for broader use.
    • Updated the Kubernetes StatefulSet configuration for the starcoin application, including a new container image and node selector.
    • Enhanced the waiting_for_parents method to improve block execution control flow and error handling.
  • Bug Fixes

    • Adjusted logic in the reset method to ensure comprehensive cleanup during block deletion.

Copy link

coderabbitai bot commented Oct 9, 2024

Caution

Review failed

The head commit changed during the review from d1436a9 to c5b9286.

Walkthrough

The changes introduce two new public methods in the BlockDAG struct: save_dag_state_directly, which enables direct insertion of a DagState into the state store, and reachability_service, which provides access to the MTReachabilityService. The visibility of the reachability_service field in the GhostdagManager struct is updated to public. The reset method in the WriteBlockChainService is modified for block deletion using a reachability service. Additionally, Kubernetes configuration for the starcoin application is updated.

Changes

File Change Summary
flexidag/src/blockdag.rs Added pub fn save_dag_state_directly(&self, hash: Hash, state: DagState) -> anyhow::Result<()> and pub fn reachability_service(&self) -> MTReachabilityService<DbReachabilityStore> methods.
flexidag/src/ghostdag/protocol.rs Changed visibility of reachability_service from pub(super) to pub in GhostdagManager.
sync/src/block_connector/write_block_chain.rs Modified reset method logic for block deletion using reachability service.
sync/src/parallel/executor.rs Updated waiting_for_parents method to include Arc<dyn Store> parameter and enhanced error handling.
kube/manifest/starcoin-halley.yaml Updated container image to starcoin/starcoin:dag-master, added node selector starcoin/node-pool: seed-pool.

Possibly related PRs

  • Dag sync continue #4153: This PR introduces changes related to synchronization and storage management, which may indirectly relate to the BlockDAG methods for state management and reachability.
  • delete all dag block in sync dag store to protect the blocks not int … #4165: This PR adds methods for deleting absent blocks in the synchronization context, which could be relevant to the BlockDAG's state management and the new methods introduced in the main PR.
  • Pruning logic #4194: This PR focuses on pruning logic, which is closely related to the enhancements made in the BlockDAG regarding state handling and pruning verification.

Suggested reviewers

  • jolestar
  • welbon
  • nkysg
  • yourmoonlight
  • simonjiao
  • sanlee42

Poem

🐇 In the land of blocks and chains so bright,
A reachability service comes to light.
With methods new and fields made clear,
The ghosts of blocks now draw near.
So hop along, let's celebrate,
For cleaner code, we won't be late! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
flexidag/src/blockdag.rs (1)

562-564: LGTM, but consider adding documentation and evaluating encapsulation.

The new reachability_service method provides external access to the MTReachabilityService, which could be useful for optimizations or additional functionality. However, consider the following suggestions:

  1. Add documentation comments to explain the purpose and potential use cases of this method.
  2. Evaluate if exposing this internal component aligns with the desired level of encapsulation for the BlockDAG struct.

Consider adding documentation comments to the method:

+    /// Returns the MTReachabilityService associated with the PruningPointManager.
+    /// This method provides access to the reachability service for external use.
+    /// 
+    /// # Returns
+    /// An instance of MTReachabilityService<DbReachabilityStore>
     pub fn reachability_service(&self) -> MTReachabilityService<DbReachabilityStore> {
         self.pruning_point_manager().reachability_service()
     }
sync/src/block_connector/write_block_chain.rs (1)

381-381: Typo in variable name lastest

The variable lastest appears to be a typo. Consider renaming it to latest for better readability and to avoid confusion.

Suggested change:

-            let lastest = self.main.status().head.id();
+            let latest = self.main.status().head.id();
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between ea56f4a and 0addbf7.

📒 Files selected for processing (3)
  • flexidag/src/blockdag.rs (1 hunks)
  • flexidag/src/ghostdag/protocol.rs (1 hunks)
  • sync/src/block_connector/write_block_chain.rs (1 hunks)
🧰 Additional context used
🔇 Additional comments (2)
flexidag/src/ghostdag/protocol.rs (1)

25-25: Visibility change for reachability_service field

The visibility of the reachability_service field has been changed from pub(super) to pub. This change allows the field to be accessed from outside the module, which could potentially affect the encapsulation of the GhostdagManager struct.

Please confirm if this change is intentional and consider the following:

  1. Are there any security implications of exposing this field publicly?
  2. Does this change align with the module's design principles and intended usage?
  3. Are there any other parts of the codebase that might need to be updated to maintain consistency with this change?

To verify the impact and usage of this field, you can run the following script:

This will help identify any existing or potential usage of the field outside its current module.

✅ Verification successful

Visibility change for reachability_service field verified

The change from pub(super) to pub for the reachability_service field is intentional and necessary for its usage across multiple modules in the codebase. This adjustment maintains the intended design and does not introduce any immediate security concerns.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of reachability_service field outside the current module
rg --type rust -g '!protocol.rs' 'reachability_service'

Length of output: 1993


Script:

#!/bin/bash
# List all functions or methods where reachability_service is accessed
rg --type rust 'reachability_service\.' 

# Extract the contexts around reachability_service usages for deeper analysis
rg --type rust -C 3 'reachability_service' 

Length of output: 11653

sync/src/block_connector/write_block_chain.rs (1)

383-383: Verify the parameter order in forward_chain_iterator

Ensure that the parameters start and lastest (after correcting to latest) are in the correct order when calling forward_chain_iterator. Incorrect parameter ordering may lead to unexpected behavior in block iteration and deletion.

You can confirm the parameter order by reviewing the method signature of forward_chain_iterator. Run the following script to find its definition:

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
kube/manifest/starcoin-halley.yaml (2)

Line range hint 28-45: Consider refactoring the command for improved maintainability.

While the command logic remains unchanged, its complex structure could make future maintenance and debugging challenging.

Consider the following improvements:

  1. Split the command into separate scripts or ConfigMap entries for better modularity.
  2. Use Kubernetes init containers for cleanup operations instead of including them in the main container command.
  3. Implement proper logging for each step to aid in debugging.
  4. Consider using Kubernetes secrets for sensitive data like node keys instead of environment variables.

Example refactoring (partial):

      initContainers:
      - name: cleanup
        image: busybox
        command: ['sh', '-c', 'rm -rf /sc-data/halley/starcoin.ipc /sc-data/halley/starcoindb/db/starcoindb/LOCK /sc-data/halley/genesis_config.json']
        volumeMounts:
        - name: starcoin-volume
          mountPath: /sc-data
      containers:
      - name: starcoin
        image: starcoin/starcoin:dag-master
        command: ["/bin/sh", "-c"]
        args:
        - |
          set -e
          echo "Starting Starcoin node..."
          /starcoin/starcoin -n halley -d /sc-data \
            --p2prpc-default-global-api-quota 9000/s \
            --p2prpc-custom-user-api-quota get_header_by_hash=9000/s \
            # ... (other flags) ...
            $node_key_flag
        env:
        - name: NODE_KEY
          valueFrom:
            secretKeyRef:
              name: node-keys
              key: node-key-$(POD_NAME)

This structure separates concerns, improves readability, and makes it easier to manage and debug the deployment.


Line range hint 1-85: Summary of review findings and recommendations

This review has identified several important changes and areas for improvement in the Kubernetes StatefulSet configuration:

  1. The addition of a node selector may impact deployment flexibility and scalability.
  2. The use of a master tag for the container image could lead to inconsistent deployments.
  3. The complex command structure, while functional, could benefit from refactoring for improved maintainability and debugging.

To ensure a robust and maintainable deployment, please address the following:

  1. Verify the availability of nodes with the starcoin/node-pool: seed-pool label and the rationale behind this change.
  2. Consider using a specific version tag instead of master for the container image, and provide details on the changes and testing process for the new version.
  3. Refactor the command structure as suggested to improve modularity, security, and ease of maintenance.

These changes will contribute to a more stable, traceable, and maintainable deployment of the Starcoin application.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 0addbf7 and df11f85.

📒 Files selected for processing (2)
  • kube/manifest/starcoin-halley.yaml (1 hunks)
  • sync/src/block_connector/write_block_chain.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • sync/src/block_connector/write_block_chain.rs
🧰 Additional context used
🔇 Additional comments (1)
kube/manifest/starcoin-halley.yaml (1)

Line range hint 19-20: Verify node selector change and its impact on deployment.

A new node selector starcoin/node-pool: seed-pool has been added. This will restrict the pods to run only on nodes with this specific label.

Please confirm:

  1. Are there sufficient nodes labeled with starcoin/node-pool: seed-pool to support the desired number of replicas?
  2. What's the rationale behind this change? Is it to ensure the StatefulSet runs on specific, possibly more powerful, nodes?
  3. How might this affect the availability and scaling of the application?

To verify the available nodes, you can run:

kube/manifest/starcoin-halley.yaml Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
sync/src/block_connector/write_block_chain.rs (2)

411-436: Approve DAG state saving with suggestion for error handling

The DAG state saving logic correctly handles both the genesis case and non-genesis case. Setting the new head block as the only tip is appropriate for a reset operation.

Consider adding explicit error handling for the save_dag_state calls. For example:

-self.main.dag().save_dag_state(
+self.main.dag().save_dag_state(
     new_head_block.header().pruning_point(),
     DagState {
         tips: vec![new_head_block.header().id()],
     },
-)?;
+).map_err(|e| format_err!("Failed to save DAG state: {}", e))?;

This will provide more informative error messages if the DAG state saving fails.


Line range hint 439-452: Approve reset finalization with suggestion for block count accuracy

The logic for creating a new branch and updating the main chain is correct and consistent with the rest of the codebase.

Consider calculating the actual enacted and retracted block counts instead of using hardcoded values. This will ensure accuracy, especially in complex DAG structures. For example:

-let (enacted_count, enacted_blocks, retracted_count, retracted_blocks) =
-    (1, vec![executed_block.block.clone()], 0, vec![]);
+let enacted_blocks = self.find_blocks_until(executed_block.id(), block_id, usize::MAX)?;
+let enacted_count = enacted_blocks.len() as u64;
+let retracted_blocks = self.find_blocks_until(self.main.current_header().id(), block_id, usize::MAX)?;
+let retracted_count = retracted_blocks.len() as u64;

This change will provide more accurate information about the number of blocks affected by the reset operation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 9b586b0 and 68c433f.

📒 Files selected for processing (1)
  • sync/src/block_connector/write_block_chain.rs (2 hunks)
🧰 Additional context used
🔇 Additional comments (3)
sync/src/block_connector/write_block_chain.rs (3)

381-387: Improved block traversal using reachability service

The use of the reachability service's forward chain iterator is a significant improvement. It allows for more precise block traversal and deletion, which is crucial in a DAG-based blockchain architecture.


Line range hint 381-452: Significant improvement in DAG reset functionality

The changes to the reset method represent a substantial improvement in handling DAG-based blockchain resets. The use of the reachability service for block traversal, precise block deletion logic, and proper DAG state management all contribute to a more robust and accurate reset process.

Key improvements:

  1. More precise block traversal using the reachability service.
  2. Careful handling of parent block deletion to maintain DAG integrity.
  3. Proper DAG state saving for both genesis and non-genesis cases.
  4. Consistent use of existing methods like do_new_head for chain updates.

While the overall implementation is solid, consider the suggested improvements for error handling and block count accuracy to further enhance the robustness of this critical operation.


389-409: Verify parent block deletion logic

The child block deletion logic is correct and straightforward. However, the parent block deletion process is more complex and may require additional verification.

Please run the following script to verify the parent block deletion logic:

This script will help ensure that no blocks are left with deleted parents, which could lead to orphaned blocks in the DAG.

Copy link

github-actions bot commented Oct 9, 2024

Benchmark for 2166046

Click to view benchmark
Test Base PR %
accumulator_append 897.8±197.77µs 892.2±202.55µs -0.62%
block_apply/block_apply_10 407.0±39.78ms 380.7±13.08ms -6.46%
block_apply/block_apply_1000 45.0±3.38s 45.4±3.12s +0.89%
get_with_proof/db_store 52.6±9.78µs 46.2±3.55µs -12.17%
get_with_proof/mem_store 36.0±1.74µs 35.8±1.70µs -0.56%
put_and_commit/db_store/1 123.2±11.28µs 200.7±33.47µs +62.91%
put_and_commit/db_store/10 1123.0±126.92µs 1111.7±116.67µs -1.01%
put_and_commit/db_store/100 11.3±1.72ms 10.3±1.40ms -8.85%
put_and_commit/db_store/5 639.2±94.47µs 607.7±86.08µs -4.93%
put_and_commit/db_store/50 5.6±0.44ms 5.1±0.46ms -8.93%
put_and_commit/mem_store/1 78.9±11.72µs 70.6±6.97µs -10.52%
put_and_commit/mem_store/10 681.9±85.94µs 668.0±76.66µs -2.04%
put_and_commit/mem_store/100 6.9±0.65ms 6.5±0.51ms -5.80%
put_and_commit/mem_store/5 348.7±39.95µs 388.7±77.08µs +11.47%
put_and_commit/mem_store/50 3.5±0.40ms 3.3±0.30ms -5.71%
query_block/query_block_in(10)_times(100) 8.7±0.43ms 8.2±0.14ms -5.75%
query_block/query_block_in(10)_times(1000) 84.4±5.82ms 82.9±3.70ms -1.78%
query_block/query_block_in(10)_times(10000) 853.3±32.03ms 858.3±47.13ms +0.59%
query_block/query_block_in(1000)_times(100) 1698.1±271.63µs 1264.5±69.60µs -25.53%
query_block/query_block_in(1000)_times(1000) 14.7±1.37ms 12.7±0.38ms -13.61%
query_block/query_block_in(1000)_times(10000) 138.4±6.21ms 128.2±10.83ms -7.37%
storage_transaction 1108.2±369.92µs 1151.4±422.48µs +3.90%
vm/transaction_execution/1 470.4±59.96ms 455.0±41.46ms -3.27%
vm/transaction_execution/10 169.9±35.40ms 135.3±5.77ms -20.36%
vm/transaction_execution/20 152.6±13.37ms 126.1±10.31ms -17.37%
vm/transaction_execution/5 166.4±9.08ms 167.3±13.11ms +0.54%
vm/transaction_execution/50 164.7±17.37ms 147.4±12.42ms -10.50%

@jackzhhuang jackzhhuang marked this pull request as draft October 10, 2024 02:23
Copy link

Benchmark for 5d25a4e

Click to view benchmark
Test Base PR %
accumulator_append 827.5±206.58µs 776.6±96.39µs -6.15%
block_apply/block_apply_10 379.5±14.82ms 366.4±6.55ms -3.45%
block_apply/block_apply_1000 40.4±0.99s 40.7±1.27s +0.74%
get_with_proof/db_store 44.0±4.17µs 43.2±3.46µs -1.82%
get_with_proof/mem_store 38.1±8.11µs 35.8±1.96µs -6.04%
put_and_commit/db_store/1 117.7±9.02µs 116.4±6.54µs -1.10%
put_and_commit/db_store/10 1047.4±59.13µs 1027.2±104.50µs -1.93%
put_and_commit/db_store/100 10.1±0.56ms 9.8±1.06ms -2.97%
put_and_commit/db_store/5 549.9±51.13µs 539.2±37.61µs -1.95%
put_and_commit/db_store/50 5.1±0.42ms 5.1±0.57ms 0.00%
put_and_commit/mem_store/1 73.8±11.04µs 70.3±7.21µs -4.74%
put_and_commit/mem_store/10 674.8±66.24µs 653.3±67.41µs -3.19%
put_and_commit/mem_store/100 7.0±1.17ms 6.5±0.62ms -7.14%
put_and_commit/mem_store/5 342.4±59.12µs 332.2±34.31µs -2.98%
put_and_commit/mem_store/50 3.2±0.36ms 3.3±0.53ms +3.13%
query_block/query_block_in(10)_times(100) 8.3±0.43ms 8.8±0.76ms +6.02%
query_block/query_block_in(10)_times(1000) 83.3±3.30ms 85.6±4.04ms +2.76%
query_block/query_block_in(10)_times(10000) 1096.2±389.15ms 815.5±12.13ms -25.61%
query_block/query_block_in(1000)_times(100) 1190.4±36.50µs 1269.2±114.42µs +6.62%
query_block/query_block_in(1000)_times(1000) 11.8±0.88ms 13.3±1.33ms +12.71%
query_block/query_block_in(1000)_times(10000) 115.7±3.13ms 124.2±5.21ms +7.35%
storage_transaction 1128.7±449.35µs 1140.9±472.39µs +1.08%
vm/transaction_execution/1 410.8±19.04ms 413.2±15.23ms +0.58%
vm/transaction_execution/10 134.0±6.63ms 168.3±24.17ms +25.60%
vm/transaction_execution/20 119.4±4.05ms 133.5±10.16ms +11.81%
vm/transaction_execution/5 170.0±10.32ms 159.8±7.15ms -6.00%
vm/transaction_execution/50 146.8±10.12ms 141.9±6.22ms -3.34%

@jackzhhuang jackzhhuang marked this pull request as ready for review October 10, 2024 08:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
sync/src/block_connector/write_block_chain.rs (1)

411-436: DAG state update logic looks good, but error handling can be improved

The new implementation for updating the DAG state after resetting the chain is well-structured and handles both the genesis case and normal cases correctly. However, there's room for improvement in error handling:

  1. The save_dag_state_directly calls (lines 417-428 and 430-435) don't have any error handling. Consider wrapping these calls in a Result and propagating any errors up the call stack.

  2. To improve robustness, you might want to consider making these state updates transactional. If any of the save_dag_state_directly calls fail, you should be able to roll back to the previous state.

Here's a suggested improvement for error handling:

- self.main.dag().save_dag_state_directly(
-     new_head_block.header().pruning_point(),
-     DagState {
-         tips: vec![new_head_block.header().id()],
-     },
- )?;
+ self.main.dag().save_dag_state_directly(
+     new_head_block.header().pruning_point(),
+     DagState {
+         tips: vec![new_head_block.header().id()],
+     },
+ ).map_err(|e| format_err!("Failed to save DAG state: {}", e))?;

Apply similar changes to other save_dag_state_directly calls.

flexidag/src/blockdag.rs (2)

452-455: LGTM. Consider adding a clarifying comment.

The new save_dag_state_directly method provides a straightforward way to save the DAG state. However, it bypasses the merging logic present in the existing save_dag_state method.

Consider adding a comment to explain when to use this method versus save_dag_state, and any potential implications of bypassing the merging logic.


567-569: LGTM. Consider adding usage guidelines.

The new reachability_service method provides direct access to the MTReachabilityService. This can be useful for external components that need to interact with the reachability service.

Consider adding a comment to explain:

  1. The purpose of exposing this service.
  2. Guidelines for its usage to prevent misuse.
  3. Any potential implications for future refactoring or encapsulation.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 68c433f and fb1b5c6.

📒 Files selected for processing (2)
  • flexidag/src/blockdag.rs (2 hunks)
  • sync/src/block_connector/write_block_chain.rs (2 hunks)
🧰 Additional context used
🔇 Additional comments (3)
sync/src/block_connector/write_block_chain.rs (3)

15-15: LGTM: New import for DAG state management

The addition of DagState import is consistent with the changes in the reset method, which now involves DAG state management.


381-437: Overall assessment: Significant improvements with minor suggestions

The changes to the reset method represent a substantial improvement in block management and DAG state handling. The new implementation offers more precise control over block deletion and ensures proper updating of the DAG state after reset operations.

Key improvements:

  1. Use of reachability service for accurate block deletion.
  2. Proper handling of parent blocks during deletion.
  3. Correct updating of DAG state, including special handling for the genesis case.

Suggestions for further improvement:

  1. Add checks to prevent deletion of parent blocks that might be referenced by other blocks in the DAG.
  2. Improve error handling for save_dag_state_directly calls.
  3. Consider making state updates transactional to ensure atomicity.

These changes significantly enhance the robustness and correctness of the reset operation. With the suggested minor improvements, this implementation will provide a solid foundation for managing complex blockchain states.


381-409: Improved block deletion mechanism with a potential concern

The new implementation using the reachability service provides a more precise way of deleting blocks. It ensures consistency by deleting both child and parent blocks that are not part of the new chain.

However, there's a potential issue to consider:

  • When deleting parent blocks (lines 398-407), we should verify that these blocks are not referenced by other blocks in the DAG. Deleting them without this check might lead to orphaned blocks or inconsistencies in the DAG structure.

Consider adding a check to ensure that parent blocks are not referenced by other blocks before deletion.

To verify if parent blocks might be referenced by other blocks, we can use the following script:

@jackzhhuang jackzhhuang marked this pull request as draft October 10, 2024 08:52
Copy link

Benchmark for 1eff680

Click to view benchmark
Test Base PR %
accumulator_append 847.5±211.31µs 1025.3±213.10µs +20.98%
block_apply/block_apply_10 388.1±21.80ms 372.8±9.98ms -3.94%
block_apply/block_apply_1000 45.3±3.04s 41.7±1.22s -7.95%
get_with_proof/db_store 44.1±3.73µs 43.9±4.32µs -0.45%
get_with_proof/mem_store 38.8±6.97µs 36.4±3.34µs -6.19%
put_and_commit/db_store/1 127.4±18.87µs 121.5±18.20µs -4.63%
put_and_commit/db_store/10 1168.4±133.39µs 1096.8±101.23µs -6.13%
put_and_commit/db_store/100 10.2±0.97ms 9.8±0.74ms -3.92%
put_and_commit/db_store/5 534.1±63.80µs 555.5±66.68µs +4.01%
put_and_commit/db_store/50 5.1±0.37ms 5.4±0.95ms +5.88%
put_and_commit/mem_store/1 71.5±11.64µs 75.4±13.13µs +5.45%
put_and_commit/mem_store/10 645.6±55.67µs 673.5±80.70µs +4.32%
put_and_commit/mem_store/100 6.7±0.87ms 6.7±1.24ms 0.00%
put_and_commit/mem_store/5 331.3±34.66µs 344.5±54.44µs +3.98%
put_and_commit/mem_store/50 3.3±0.40ms 3.3±0.59ms 0.00%
query_block/query_block_in(10)_times(100) 9.3±1.26ms 9.1±1.15ms -2.15%
query_block/query_block_in(10)_times(1000) 87.8±4.63ms 82.4±4.69ms -6.15%
query_block/query_block_in(10)_times(10000) 825.4±37.71ms 1089.2±180.34ms +31.96%
query_block/query_block_in(1000)_times(100) 1317.7±80.76µs 1254.0±141.72µs -4.83%
query_block/query_block_in(1000)_times(1000) 12.7±1.00ms 13.0±1.54ms +2.36%
query_block/query_block_in(1000)_times(10000) 134.3±15.06ms 119.7±3.46ms -10.87%
storage_transaction 1121.9±441.75µs 1088.6±477.46µs -2.97%
vm/transaction_execution/1 416.9±20.63ms 489.1±36.16ms +17.32%
vm/transaction_execution/10 140.4±13.43ms 131.3±4.79ms -6.48%
vm/transaction_execution/20 127.9±11.06ms 133.5±9.77ms +4.38%
vm/transaction_execution/5 169.7±14.59ms 171.7±21.59ms +1.18%
vm/transaction_execution/50 148.0±10.86ms 157.3±17.74ms +6.28%

Copy link

Benchmark for 7c0ab44

Click to view benchmark
Test Base PR %
accumulator_append 900.4±184.98µs 898.1±180.83µs -0.26%
block_apply/block_apply_10 378.5±11.93ms 424.3±28.61ms +12.10%
block_apply/block_apply_1000 44.1±1.68s 45.1±3.38s +2.27%
get_with_proof/db_store 43.9±1.96µs 46.1±5.54µs +5.01%
get_with_proof/mem_store 36.0±1.06µs 42.7±8.74µs +18.61%
put_and_commit/db_store/1 128.2±16.36µs 142.2±29.51µs +10.92%
put_and_commit/db_store/10 1109.1±142.58µs 1105.3±149.13µs -0.34%
put_and_commit/db_store/100 12.2±2.79ms 9.8±0.61ms -19.67%
put_and_commit/db_store/5 587.7±120.54µs 709.1±126.63µs +20.66%
put_and_commit/db_store/50 6.0±0.98ms 5.4±0.60ms -10.00%
put_and_commit/mem_store/1 72.5±10.86µs 82.8±16.99µs +14.21%
put_and_commit/mem_store/10 693.5±111.79µs 690.0±81.34µs -0.50%
put_and_commit/mem_store/100 6.9±0.79ms 6.6±0.86ms -4.35%
put_and_commit/mem_store/5 358.0±54.40µs 366.1±69.70µs +2.26%
put_and_commit/mem_store/50 3.9±0.63ms 3.4±0.44ms -12.82%
query_block/query_block_in(10)_times(100) 9.8±1.72ms 8.6±0.49ms -12.24%
query_block/query_block_in(10)_times(1000) 85.9±9.78ms 83.9±5.45ms -2.33%
query_block/query_block_in(10)_times(10000) 952.0±64.57ms 1176.1±339.36ms +23.54%
query_block/query_block_in(1000)_times(100) 1397.4±96.25µs 1403.6±159.84µs +0.44%
query_block/query_block_in(1000)_times(1000) 14.6±2.67ms 14.1±2.37ms -3.42%
query_block/query_block_in(1000)_times(10000) 132.7±5.79ms 134.9±17.16ms +1.66%
storage_transaction 1132.6±425.72µs 1235.1±511.71µs +9.05%
vm/transaction_execution/1 470.1±35.16ms 478.2±51.26ms +1.72%
vm/transaction_execution/10 133.5±8.35ms 140.1±8.16ms +4.94%
vm/transaction_execution/20 134.7±10.61ms 131.0±11.10ms -2.75%
vm/transaction_execution/5 180.8±21.55ms 169.3±9.03ms -6.36%
vm/transaction_execution/50 147.2±11.74ms 148.1±9.01ms +0.61%

@jackzhhuang jackzhhuang marked this pull request as ready for review October 11, 2024 01:10
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
sync/src/parallel/executor.rs (1)

Line range hint 93-113: Potential busy-wait when waiting for parent blocks

In the loop where waiting_for_parents is called, if the method returns Ok(false), the code yields with tokio::task::yield_now().await and immediately retries. This can lead to a busy-wait loop that consumes CPU resources if the parent blocks are not available immediately. Consider adding a delay or an exponential backoff strategy to reduce CPU usage.

You could modify the code as follows:

-    Ok(false) => tokio::task::yield_now().await,
+    Ok(false) => {
+        tokio::time::sleep(Duration::from_millis(100)).await;
+    },

Remember to import the Duration type:

use tokio::time::Duration;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between fb1b5c6 and c5b9286.

📒 Files selected for processing (2)
  • sync/src/block_connector/write_block_chain.rs (4 hunks)
  • sync/src/parallel/executor.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • sync/src/block_connector/write_block_chain.rs
🧰 Additional context used
🔇 Additional comments (2)
sync/src/parallel/executor.rs (2)

57-70: Approved: Added storage checks in waiting_for_parents function

The addition of storage checks to verify the existence of block headers and block info before proceeding enhances the robustness of the block execution process. This ensures that all parent blocks are present in storage and have associated block info before execution.


102-104: Approved: Updated method call to include storage parameter

The call to waiting_for_parents now correctly includes self.storage.clone(), aligning with the updated method signature and ensuring access to storage within the method.

Copy link

Benchmark for 3c0cbe0

Click to view benchmark
Test Base PR %
accumulator_append 1023.2±231.25µs 1002.1±258.07µs -2.06%
block_apply/block_apply_10 494.6±72.72ms 388.5±15.02ms -21.45%
block_apply/block_apply_1000 49.6±3.97s 48.0±3.38s -3.23%
get_with_proof/db_store 50.3±9.74µs 43.8±1.80µs -12.92%
get_with_proof/mem_store 36.6±5.58µs 37.9±3.61µs +3.55%
put_and_commit/db_store/1 161.6±34.41µs 123.4±13.75µs -23.64%
put_and_commit/db_store/10 1374.2±258.65µs 1386.6±199.07µs +0.90%
put_and_commit/db_store/100 11.4±1.78ms 10.3±0.83ms -9.65%
put_and_commit/db_store/5 924.2±101.95µs 690.3±112.43µs -25.31%
put_and_commit/db_store/50 5.4±0.61ms 5.4±0.77ms 0.00%
put_and_commit/mem_store/1 75.8±15.79µs 74.6±10.41µs -1.58%
put_and_commit/mem_store/10 887.6±179.76µs 690.3±78.54µs -22.23%
put_and_commit/mem_store/100 7.3±1.38ms 6.7±0.88ms -8.22%
put_and_commit/mem_store/5 390.7±65.15µs 403.4±72.24µs +3.25%
put_and_commit/mem_store/50 4.0±0.85ms 3.8±0.72ms -5.00%
query_block/query_block_in(10)_times(100) 8.9±1.27ms 10.2±2.53ms +14.61%
query_block/query_block_in(10)_times(1000) 99.6±22.42ms 88.4±7.21ms -11.24%
query_block/query_block_in(10)_times(10000) 1106.8±175.62ms 925.2±83.59ms -16.41%
query_block/query_block_in(1000)_times(100) 1475.0±186.12µs 1604.4±385.50µs +8.77%
query_block/query_block_in(1000)_times(1000) 13.0±0.89ms 16.2±3.65ms +24.62%
query_block/query_block_in(1000)_times(10000) 138.1±15.39ms 129.1±11.37ms -6.52%
storage_transaction 1288.9±433.89µs 999.1±326.62µs -22.48%
vm/transaction_execution/1 527.8±79.31ms 549.6±90.62ms +4.13%
vm/transaction_execution/10 168.6±9.60ms 205.3±8.21ms +21.77%
vm/transaction_execution/20 130.5±8.30ms 164.3±12.86ms +25.90%
vm/transaction_execution/5 193.3±16.29ms 202.7±48.18ms +4.86%
vm/transaction_execution/50 151.7±11.59ms 170.6±17.11ms +12.46%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant