Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[loader-v2] Fixing global cache reads & read-before-write on publish #15285

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

georgemitenkov
Copy link
Contributor

@georgemitenkov georgemitenkov commented Nov 15, 2024

Description

  • Capture global cache reads as well. Resolve first to captured reads (per transaction), then to global cache, then to per-block, then to state view.
  • Issue read-before-write for modules at commit.

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Nov 15, 2024

⏱️ 5h 30m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 3h 14m 🟥🟩🟥🟩🟩
execution-performance / test-target-determinator 23m 🟩🟩🟩🟩🟩
test-target-determinator 14m 🟩🟩🟩
rust-move-tests 13m 🟩
rust-move-tests 12m 🟩
check 11m 🟩🟩🟩
check-dynamic-deps 10m 🟩🟩🟩🟩🟩 (+2 more)
rust-move-tests 10m
rust-cargo-deny 8m 🟩🟩🟩🟩
rust-move-tests 8m
fetch-last-released-docker-image-tag 5m 🟩🟩🟩
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
semgrep/ci 3m 🟩🟩🟩🟩🟩 (+2 more)

🚨 2 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
execution-performance / single-node-performance 38m 16m +143%
execution-performance / test-target-determinator 6m 4m +25%

settingsfeedbackdocs ⋅ learn more about trunk.io

@georgemitenkov georgemitenkov marked this pull request as ready for review November 15, 2024 02:39
@georgemitenkov georgemitenkov requested review from msmouse and igor-aptos and removed request for sasha8 and danielxiangzl November 15, 2024 02:39
@georgemitenkov georgemitenkov added CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR CICD:run-execution-performance-test Run execution performance test CICD:run-execution-performance-full-test Run execution performance test (full version) labels Nov 15, 2024

This comment has been minimized.

This comment has been minimized.

Comment on lines 298 to 302
enum ModuleRead<DC, VC, S> {
/// Read from the cross-block module cache.
GlobalCache,
GlobalCache(Arc<ModuleCode<DC, VC, S>>),
/// Read from per-block cache ([SyncCodeCache]) used by parallel execution.
PerBlockCache(Option<(Arc<ModuleCode<DC, VC, S>>, Option<TxnIndex>)>),
Copy link
Contributor

@igor-aptos igor-aptos Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why do we distinguish reads here based on where we got the data from? also what is Option<TxnIndex> in the PerBlockCache ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option - module does not exist (in StateView even).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different reads - different validations. We need to check that global reads are still valid, and per-block reads have the same version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stupid formatting, didn't show I was referring to TxnIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - None is a storage version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it better than Result<TxnIndex, StorageVersion>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why distinction between storage version and global cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different validation paths: for global cache read we need to check if the read is still valid in cache. For per-block we go to MVHashMap. Now, the question is about storage read: we issue it only when there is a cache miss in per-block cache, so it gets validated there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically "storage version" can be later drained into global cache, but otherwise exists only in per-block

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so from validation perspective - there is no distinction

distinction is ONLY there to make updating global cache (i.e. draining to it) be faster/cheaper by skipping things that are already there.

is that correct?

This comment has been minimized.

This comment has been minimized.

@georgemitenkov georgemitenkov changed the title Loader fixes [loader-v2] Fixing global cache reads & read-before-write on publish Nov 15, 2024

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on dc8af3fbc4d26c5768c7c665de93ea38841222a1

two traffics test: inner traffic : committed: 14364.97 txn/s, latency: 2768.20 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3900 ms), latency samples: 5461940
two traffics test : committed: 100.03 txn/s, latency: 1415.30 ms, (p50: 1400 ms, p70: 1400, p90: 1500 ms, p99: 2200 ms), latency samples: 1700
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.107, avg: 1.575", "ConsensusProposalToOrdered: max: 0.328, avg: 0.294", "ConsensusOrderedToCommit: max: 0.384, avg: 0.367", "ConsensusProposalToCommit: max: 0.679, avg: 0.662"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 1.18s no progress at version 28656 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.71s no progress at version 2145368 (avg 7.38s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite compat success on 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> dc8af3fbc4d26c5768c7c665de93ea38841222a1

Compatibility test results for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> dc8af3fbc4d26c5768c7c665de93ea38841222a1 (PR)
1. Check liveness of validators at old version: 2bb2d43037a93d883729869d65c7c6c75b028fa1
compatibility::simple-validator-upgrade::liveness-check : committed: 15246.12 txn/s, latency: 2214.85 ms, (p50: 2000 ms, p70: 2100, p90: 2500 ms, p99: 6200 ms), latency samples: 493800
2. Upgrading first Validator to new version: dc8af3fbc4d26c5768c7c665de93ea38841222a1
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7251.83 txn/s, latency: 3852.43 ms, (p50: 4100 ms, p70: 4300, p90: 5000 ms, p99: 5400 ms), latency samples: 139360
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7627.08 txn/s, latency: 4194.99 ms, (p50: 4300 ms, p70: 4400, p90: 6400 ms, p99: 6600 ms), latency samples: 250800
3. Upgrading rest of first batch to new version: dc8af3fbc4d26c5768c7c665de93ea38841222a1
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7656.05 txn/s, latency: 3777.18 ms, (p50: 4200 ms, p70: 4300, p90: 4500 ms, p99: 4600 ms), latency samples: 143500
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6990.18 txn/s, latency: 4533.15 ms, (p50: 4500 ms, p70: 4600, p90: 6800 ms, p99: 7100 ms), latency samples: 231000
4. upgrading second batch to new version: dc8af3fbc4d26c5768c7c665de93ea38841222a1
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 12427.58 txn/s, latency: 2166.84 ms, (p50: 2100 ms, p70: 2500, p90: 3200 ms, p99: 3500 ms), latency samples: 215140
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11630.06 txn/s, latency: 2628.42 ms, (p50: 2100 ms, p70: 2700, p90: 5000 ms, p99: 7300 ms), latency samples: 381720
5. check swarm health
Compatibility test for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> dc8af3fbc4d26c5768c7c665de93ea38841222a1 passed
Test Ok

Comment on lines 298 to 302
enum ModuleRead<DC, VC, S> {
/// Read from the cross-block module cache.
GlobalCache,
GlobalCache(Arc<ModuleCode<DC, VC, S>>),
/// Read from per-block cache ([SyncCodeCache]) used by parallel execution.
PerBlockCache(Option<(Arc<ModuleCode<DC, VC, S>>, Option<TxnIndex>)>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so from validation perspective - there is no distinction

distinction is ONLY there to make updating global cache (i.e. draining to it) be faster/cheaper by skipping things that are already there.

is that correct?

@@ -661,7 +658,7 @@ where
}

self.module_reads.iter().all(|(key, read)| match read {
ModuleRead::GlobalCache => global_module_cache.contains_valid(key),
ModuleRead::GlobalCache(_) => global_module_cache.contains_valid(key),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this whole match be equivalent to:

        self.module_reads.iter().all(|(key, read)| {
            let previous_version = match read {
              ModuleRead::GlobalCache(_) => None, // i.e. storage version
              ModuleRead::PerBlockCache(previous) => previous.as_ref().map(|(_, version)| *version);
            };
            let current_version = per_block_module_cache.get_module_version(key);
            current_version == previous_version
        })

why do we need to update GlobalCache at all while executing a block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do if we read first from it (to know if entry is overridden or not). An alternative is to check lower level cache first, but this means performance penalty due to locking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code can be somewhat equivalent, but:

let current_version = per_block_module_cache.get_module_version(key);

causes a prefetch of storage version by default. We would need to special case validation to not do it. An we also end up locking the cache (shard, worst case), instead of checking an atomic bool

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because we may publish a module that invalidates the global cache that's being read I think

}

// Otherwise, it is a miss. Check global cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we check global cache before checking state.versioned_map.module_cache ?

on rolling commit - are we updating GlobalCache itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We update global cache at rolling commit - if published keys exist in global cache, we mark them as invalid. So reads to them results in a cache miss and we fallback to MVHashMap where we have placed the write at commit time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check versioned before, but then you end up acquiring a lock for potentially non-republished module (publish is rare). If 32 threads do this for aptos-framework, this is bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead, we lookup in global first, but check an atomic bool flag there (better than a lock), so we optimize for read case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then I would rename PerBlockCache to UnfinalizedBlockCache or something like that - to make it clear it only ever refers to things before rolling commit, and GlobalCache is global and updated within the block itself.

(you can do that in separate PR of course :) )

}

// Otherwise, it is a miss. Check global cache.
if let Some(module) = self.global_module_cache.get_valid(key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we reverse the order of checking now? (I was wondering for the previous pr about the order too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, we always check local cache first. If it is not there, we as before check 1) global first, if valid, 2) per-block next. In both cases, clone the module to captured reads (local cache). So next read always reads the same thing. Does this make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking more about why we checked global cache in previous pr, is this an orthogonal change or we need to reverse the order now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still first check global cache.

What is added here is check in captured reads - meaning whether this same transaction has already read it, and if it did - do not read it again

@@ -1233,8 +1240,17 @@ where
Version = Option<TxnIndex>,
>,
) -> Result<(), PanicError> {
let (id, write_op) = write.unpack();
// Enforce read-before-write because storage (DB) relies on this assumption.
let _ = base_view.get_state_value(&key).map_err(|err| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need some tests for this assumption 🤦‍♂️

Copy link
Contributor Author

@georgemitenkov georgemitenkov Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msmouse Do you have a code pointer how I can test this with a DB?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he adds a panic here #15283 so we just need to test such some publish scenarios (with real db)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many tests fail with the panic.. I need to debug today to see how to enforce the assumption better.

none the less I think we need to trigger some module republishing in a smoke test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msmouse does it need to be a smoke test? we can have executor benchmark lib unit test with publishing with 3 lines of code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's hard to construct some complex scenario in smoke test because most of the time a block only has one txn (block produces too fast). it's easier to use executor + db alone to reproduce. I thought we already have module publish in smoke tests

Copy link
Contributor

@igor-aptos igor-aptos Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


    #[test]
    fn test_module_publish_txns() {
        AptosVM::set_num_shards_once(1);
        AptosVM::set_concurrency_level_once(4);
        AptosVM::set_processed_transactions_detailed_counters();
        test_generic_benchmark::<AptosVMBlockExecutor>(
            Some(TransactionTypeArg::PublishPackage),
            true,
        );
    }

this doesn't call on them afterwards though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to repeatedly publish the same module and the module must be being used so it resides in cache, is that simple to implement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be, let me see

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the republish and call mixed workload - #15292, and it fails.

you can use it to confirm/test this fix.

@@ -1233,8 +1240,17 @@ where
Version = Option<TxnIndex>,
>,
) -> Result<(), PanicError> {
let (id, write_op) = write.unpack();
// Enforce read-before-write because storage (DB) relies on this assumption.
let _ = base_view.get_state_value(&key).map_err(|err| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the republish and call mixed workload - #15292, and it fails.

you can use it to confirm/test this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR CICD:run-execution-performance-full-test Run execution performance test (full version) CICD:run-execution-performance-test Run execution performance test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants