Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeltaApplicationError & predicted value fixes, error fallbacks #14422

Merged
merged 2 commits into from
Oct 2, 2024

Conversation

gelash
Copy link
Contributor

@gelash gelash commented Aug 26, 2024

  1. Separate captured reads speculative failure flag for delayed_field and non_delayed_field variants. This is because validations happen at different times, and previously validate_data_reads and validate_group_reads would fail on the flag even if it was set due to delayed fields related error.
  2. Reduce the cases when DeltaApplicationError might occur, specifically when a delayed field is created in the same block and creation is not yet committed. In general, getting some value gives the transaction strictly more change to commit than returning an error (previously would get immediately invalidated due to (1) above and could lead to a lot of wasted work, but even now would re-execute after commit if there was a delayed field related speculative failure).
  3. Introduce a number of useful fallbacks to sequential execution in more problematic scenarios:
  • observed high incarnation (might imply an infinite loop or huge inefficiency)
  • commit queue not empty after workers are done (they are supposed to drain it)
  • take output fails to unwrap an Arc (could imply another dangling pointer)
  • update to skip rest also, but this is simple expect -> PanicError

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

Added new tests for changing functionality, and passes existing tests.

Copy link

trunk-io bot commented Aug 26, 2024

⏱️ 50m total CI duration on this PR
Job Cumulative Duration Recent Runs
execution-performance / single-node-performance 35m 🟩
rust-move-tests 9m 🟩
rust-cargo-deny 2m 🟩
general-lints 2m 🟩
check-dynamic-deps 36s 🟩
semgrep/ci 21s 🟩
file_change_determinator 11s 🟩
file_change_determinator 9s 🟩
permission-check 4s 🟩
permission-check 3s 🟩
permission-check 2s 🟩
permission-check 2s 🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

@gelash gelash changed the title DeltaApplicationError & latest predicted value fixes, fallbacks to se… DeltaApplicationError & predicted value fixes, more fallbacks Aug 26, 2024
@gelash gelash changed the title DeltaApplicationError & predicted value fixes, more fallbacks DeltaApplicationError & predicted value fixes, error fallbacks Aug 26, 2024
Copy link
Contributor

@igor-aptos igor-aptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment inline, rest looks good!

aptos-move/block-executor/src/executor.rs Outdated Show resolved Hide resolved
@gelash gelash enabled auto-merge (squash) September 30, 2024 15:27

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@gelash gelash enabled auto-merge (squash) October 2, 2024 17:15

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Oct 2, 2024

✅ Forge suite compat success on 628e88b8a1971b4986dfb2b88ec763090f85c82f ==> 927d85de683782b87efa29dbf50a9ff45cbcfee9

Compatibility test results for 628e88b8a1971b4986dfb2b88ec763090f85c82f ==> 927d85de683782b87efa29dbf50a9ff45cbcfee9 (PR)
1. Check liveness of validators at old version: 628e88b8a1971b4986dfb2b88ec763090f85c82f
compatibility::simple-validator-upgrade::liveness-check : committed: 13030.43 txn/s, latency: 2245.20 ms, (p50: 1900 ms, p70: 2200, p90: 2500 ms, p99: 10900 ms), latency samples: 509440
2. Upgrading first Validator to new version: 927d85de683782b87efa29dbf50a9ff45cbcfee9
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6472.65 txn/s, latency: 4387.78 ms, (p50: 5100 ms, p70: 5300, p90: 5500 ms, p99: 5700 ms), latency samples: 118620
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5858.70 txn/s, latency: 5464.25 ms, (p50: 5900 ms, p70: 6100, p90: 7200 ms, p99: 7900 ms), latency samples: 192660
3. Upgrading rest of first batch to new version: 927d85de683782b87efa29dbf50a9ff45cbcfee9
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 6809.73 txn/s, latency: 4186.85 ms, (p50: 4700 ms, p70: 5000, p90: 5300 ms, p99: 5400 ms), latency samples: 128660
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6059.85 txn/s, latency: 5324.61 ms, (p50: 5400 ms, p70: 5600, p90: 7100 ms, p99: 7300 ms), latency samples: 226660
4. upgrading second batch to new version: 927d85de683782b87efa29dbf50a9ff45cbcfee9
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 10016.82 txn/s, latency: 2690.49 ms, (p50: 2800 ms, p70: 3000, p90: 3300 ms, p99: 4200 ms), latency samples: 179580
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10955.16 txn/s, latency: 2882.19 ms, (p50: 2800 ms, p70: 3000, p90: 3200 ms, p99: 4200 ms), latency samples: 358980
5. check swarm health
Compatibility test for 628e88b8a1971b4986dfb2b88ec763090f85c82f ==> 927d85de683782b87efa29dbf50a9ff45cbcfee9 passed
Test Ok

Copy link
Contributor

github-actions bot commented Oct 2, 2024

✅ Forge suite realistic_env_max_load success on 927d85de683782b87efa29dbf50a9ff45cbcfee9

two traffics test: inner traffic : committed: 14158.67 txn/s, submitted: 14160.09 txn/s, expired: 1.42 txn/s, latency: 2807.50 ms, (p50: 2300 ms, p70: 2400, p90: 2900 ms, p99: 13100 ms), latency samples: 5383480
two traffics test : committed: 99.94 txn/s, latency: 4650.44 ms, (p50: 1500 ms, p70: 1700, p90: 5400 ms, p99: 44400 ms), latency samples: 1940
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.236, avg: 0.218", "QsPosToProposal: max: 1.042, avg: 0.672", "ConsensusProposalToOrdered: max: 0.310, avg: 0.295", "ConsensusOrderedToCommit: max: 0.518, avg: 0.483", "ConsensusProposalToCommit: max: 0.816, avg: 0.779"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 1.16s no progress at version 2541065 (avg 0.21s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.62s no progress at version 2541063 (avg 5.43s) [limit 15].
Test Ok

Copy link
Contributor

github-actions bot commented Oct 2, 2024

✅ Forge suite framework_upgrade success on 628e88b8a1971b4986dfb2b88ec763090f85c82f ==> 927d85de683782b87efa29dbf50a9ff45cbcfee9

Compatibility test results for 628e88b8a1971b4986dfb2b88ec763090f85c82f ==> 927d85de683782b87efa29dbf50a9ff45cbcfee9 (PR)
Upgrade the nodes to version: 927d85de683782b87efa29dbf50a9ff45cbcfee9
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1279.19 txn/s, submitted: 1281.90 txn/s, failed submission: 2.71 txn/s, expired: 2.71 txn/s, latency: 2596.87 ms, (p50: 2400 ms, p70: 2700, p90: 3900 ms, p99: 6000 ms), latency samples: 103680
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 979.39 txn/s, submitted: 981.83 txn/s, failed submission: 2.44 txn/s, expired: 2.44 txn/s, latency: 3041.52 ms, (p50: 2400 ms, p70: 3300, p90: 6000 ms, p99: 7200 ms), latency samples: 88260
5. check swarm health
Compatibility test for 628e88b8a1971b4986dfb2b88ec763090f85c82f ==> 927d85de683782b87efa29dbf50a9ff45cbcfee9 passed
Upgrade the remaining nodes to version: 927d85de683782b87efa29dbf50a9ff45cbcfee9
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1173.96 txn/s, submitted: 1176.76 txn/s, failed submission: 2.80 txn/s, expired: 2.80 txn/s, latency: 2874.65 ms, (p50: 2600 ms, p70: 3000, p90: 4500 ms, p99: 6600 ms), latency samples: 100780
Test Ok

@gelash gelash merged commit 18bcbd2 into main Oct 2, 2024
65 of 98 checks passed
@gelash gelash deleted the gelash/deltaerrorfixes branch October 2, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR CICD:run-execution-performance-full-test Run execution performance test (full version) CICD:run-execution-performance-test Run execution performance test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants