Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

True cost of function calls #7227

Closed
jakmeier opened this issue Jul 21, 2022 · 10 comments
Closed

True cost of function calls #7227

jakmeier opened this issue Jul 21, 2022 · 10 comments
Assignees
Labels
A-params-estimator Area: runtime params estimator T-contract-runtime Team: issues relevant to the contract runtime team

Comments

@jakmeier
Copy link
Contributor

jakmeier commented Jul 21, 2022

Measure the "true" cost of setting up a function call. This is about everything from processing the transaction all the way to persisting results. Excluded is the dynamic part, i.e. what happens inside the guest VM.

The model itself is discussed in #6992. This issue here is about finding the correct gas numbers for each parameter.

@jakmeier jakmeier added T-contract-runtime Team: issues relevant to the contract runtime team A-params-estimator Area: runtime params estimator labels Jul 21, 2022
@jakmeier jakmeier self-assigned this Jul 21, 2022
@jakmeier
Copy link
Contributor Author

jakmeier commented Jul 21, 2022

With all the changes around the estimator in the past two weeks, I was now able to estimate these "true" function call costs without instability or inaccuracy. I ran the tests on our estimator hardware setup on gcloud, with a state that is preloaded with 1M accounts and RocksDB stored on an SSD.

I found that icount measurement are consistently 20%-70% lower lower than time measurements. So I decided to ignore icount measurement and go with time instead, as this reflects better how long a node really takes to process a function call.

Here is the summary of what changes to parameters the estimator suggests:

[Ggas]   [Ggas] Parameter Change Factor Note
action_receipt_creation_send_not_sir 108.060 -> 237.154 2.1947 * This affects all actions
action_receipt_creation_execution 108.060 -> 237.154 2.1947 * This affects all actions
action_function_call_send_not_sir 2319.862 -> 20.676 0.0089  
action_function_call_execution 2319.862 -> 4337.353 1.8697 ** Due to reading 4MiB contract from DB
wasm_contract_loading_base 0.0354460 -> 25.844976214 729.1374 ***
wasm_contract_loading_bytes 0.00021675 -> 0.034621497 159.7301 ***

* Might be reduced with flat storage
** Contract prefetching could reduce this, or storing size of contract. (The problem here is that we have to cover the DB fetching cost in this parameter without knowing the size, so this cost is for a 4MiB contract. If we know the size ahead of time, we could charge less for smaller contracts. Even better, if we prefetch the code ahead of time, we can reduce this to just 20Ggas.)
*** Contract loading ahead of time might solve this

These changes translate to per fn-call minimum overhead gas cost as follows, depending on contract code size.

Fn Overhead [Ggas]  10kB Sweatcoin (202kB) Aurora (1MB) 4MiB
today 4858.044945963 4899.713399463 5078.211792963 5764.992837963
new 5204.397639685 11860.103465959 40371.667918393 150071.266022773

@jakmeier
Copy link
Contributor Author

cc @akhi3030 @bowenwang1996 @matklad @nagisa

I don't think we can move forward with the plan to "ship the new fn base costs". Some re-prioritisation is probably required here.

As the table above shows, we could keep total fn base costs for small contracts at around the same cost as today, but not decrease it. Even worse, we would have to increase costs for large contracts, e.g. Aurora would go from 5 Tgas currently to 40 Tgas afterwards.

There are two main problems.

  1. DB reads amount to roughly 5ms of costs, and that's not even worst-case. This alone already prevents us from reducing the costs, as we currently charge 4.855 Tgas for all steps combined.
  2. For large contracts, the time spent on contract loading is too long. Base cost is almost irrelevant here, the per-byte cost is important. Let's take Aurora as example, the base loading cost is estimated to be at 0.026 Tgas, or 26us. The cost based on size is 35 Tgas, or 35ms.

For number 1, we could prefetch DB reads for account data, access key data, and contract code. We have a potential of 4.1ms we could shave off this way, for all contracts regardless of size. This would make the smallest contracts up to 85% cheaper compared to today.

For number 2, it's important to understand that estimation is based on a contract that adds more and more functions. This is a proxy for what we assume to be a close to worst-case contract for loading.
We could either try to understand this case better and see if we can optimise for that, but we would also have to consider other possible worst-case contracts. But the bottom line is, as long as we know of valid contracts that are so slow to compile, we cannot ship a cheaper gas cost.
Alternatively, we could start loading all contracts ahead of time in a background thread, which would also allow to reduce gas costs. Prefetching the code from the DB would also be a requirement for that.

@akhi3030
Copy link
Collaborator

There is quite a big reduction in action_function_call_send_not_sir. Do we know what happened there? Did we overestimate it originally?

@akhi3030
Copy link
Collaborator

Do we have a design already for prefetching contracts?

@jakmeier
Copy link
Contributor Author

There is quite a big reduction in action_function_call_send_not_sir. Do we know what happened there? Did we overestimate it originally?

Yeah, we know that we have been overcharging this part to cover undercharged costs in function loading. I'd say that was the main motivation behind looking all of these costs in combination and fixing the gas model first, before changing any single parameter.

Do we have a design already for prefetching contracts?

Not that I am aware of. But it wouldn't be substantially different from accounts + access key prefetching, for which we have a Jira issue STR-4.

@matklad
Copy link
Contributor

matklad commented Jul 21, 2022

Contract prefetching could reduce this, or storing size of contract.

we already store the size of the contract, so this should be "easy" to do I think

diff --git a/core/store/src/lib.rs b/core/store/src/lib.rs
index eeeb4cf3e..e9b890f9b 100644
--- a/core/store/src/lib.rs
+++ b/core/store/src/lib.rs
@@ -555,8 +555,12 @@ pub fn get_code(
     code_hash: Option<CryptoHash>,
 ) -> Result<Option<ContractCode>, StorageError> {
     state_update
-        .get(&TrieKey::ContractCode { account_id: account_id.clone() })
-        .map(|opt| opt.map(|code| ContractCode::new(code, code_hash)))
+        .get_ref(&TrieKey::ContractCode { account_id: account_id.clone() })
+        .map(|opt| opt.map(|ptr| {
+            let l = ptr.len();
+            gas_counter.add(l * loading_cost_per_byte)?;
+            ContractCode::new(ptr.deref_value()?, code_hash)
+        }))
 }
 
 /// Removes account, code and all access keys associated to it.

filed https://near.zulipchat.com/#narrow/stream/295306-pagoda.2Fcontract-runtime/topic/When.20to.20charge.20loading.20cost for discussing this

@akhi3030
Copy link
Collaborator

Summary from the zulip thread + VC call between @matklad and @jakmeier 👍

  • @matklad to revert the stabilization (feat: stabilize precharged contract loading gas #7177)
  • After @jakmeier is back, we'll think carefully how we charge costs: there's a bunch of considerations we surfaced in just five minutes. E.g.:
    • it seems logical to say that loading the contract is just storage_read op, which should be charged accordingly
    • except that if we do that, we'll hit a problem where we could load WASM, but run out of gas when loading potentially larger compiled code
    • so perhaps we need an extra parameter here?

@matklad
Copy link
Contributor

matklad commented Jul 22, 2022

I think wasm_contract_loading_bytes is measured incorrectly: it seems we don't do warm-up iterations there. With warm-up, the cost becomes couple of orders of magnitudes less.

$ cargo run -r -- --costs ContractLoadingPerByte --metric time --warmup-iters 1
ContractLoadingPerByte                                     192_248 gas

$ cargo run -r -- --costs ContractLoadingPerByte --metric time --warmup-iters 0
ContractLoadingPerByte                                  55_820_001 gas

$ cargo run -r -- --costs ContractLoadingPerByte --metric time
ContractLoadingPerByte                                  65_831_844 gas

It's interesting that we don't flag this as uncertain, because we measure aggregate time there, rather than average.

@jakmeier
Copy link
Contributor Author

Pretty sure I was running 3 warm-up iterations and then 10 measured iterations,

I suspect the warmup only makes a difference because you are running with caches enabled here. I was running without caches, a.k.a. --features=required. (Also release mode sometimes makes a big difference. LTO isn't that important here but in my reported measurments, that was also used. I didn't use page cache drops, it only makes results less stable in this case.)

But it's still an interesting finding, I would have thought pure loading costs are not affected by caches. I think we need to find out:

  • what exactly are we caching
  • should it be cached or not for a realistic worst-case estimation?

near-bulldozer bot pushed a commit that referenced this issue Jul 25, 2022
This reverts commit f7aca03.

Reverting based on discussion at #7227

The TL;DR is while the current impl of the feature charges correct cost for contract loading at *logically* correct place, we'd want to move the charging a little bit earlier physically, even before we load the contract from the database, so that the cost can be lower without adding an extra protocol feature here.
@jakmeier
Copy link
Contributor Author

It seems the numbers presented here for current "true costs" are mostly accurate. Closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-params-estimator Area: runtime params estimator T-contract-runtime Team: issues relevant to the contract runtime team
Projects
None yet
Development

No branches or pull requests

3 participants