Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking issue] State Witness size limit #10259

Open
Tracked by #46
pugachAG opened this issue Nov 28, 2023 · 6 comments
Open
Tracked by #46

[Tracking issue] State Witness size limit #10259

pugachAG opened this issue Nov 28, 2023 · 6 comments
Assignees
Labels
A-stateless-validation Area: stateless validation

Comments

@pugachAG
Copy link
Contributor

Background

Current State Witness is implicitly limited by gas. In some cases large contributors to the State Witness size are not charged enough gas, which might result in State Witness being too big for the network to distribute it to all validators in time.

Proposed solution

MVP

Limiting State Witness is not required for Stateless Validation MVP/prototype.
Also (1) shows that current mainnet receipts result in reasonable State Witness size, so that won't be an issue for prototyping.

Short Term

In the short term (before launching Stateless Validation on mainnet) we need to implement soft limit for the State Witness size on the runtime side (similar to compute costs). See this comment for more details. This would help to protect agains bringing down the network with receipts that are specifically crafted to result in large State Witness.

Long Term

I believe in the long term we need to adjust our gas costs to reflect contributions to the State Witness size. This means introducing back TTN for reads, charging for contract code size for function calls, etc.

Resources

(1) zulip thread with current witness size analysis
(2) #9378

@wacban
Copy link
Contributor

wacban commented Jan 17, 2024

Note from the onboarding discussion - another approach is to add state witness size to the compute costs. It should work good enough for short term and be fairly close to what we want in the long term.

@jancionear
Copy link
Contributor

jancionear commented Feb 29, 2024

It seems that there are three kinds of objects that contribute to state witness size:

  1. Incoming receipts and receipt proofs
  2. New transactions
  3. PartialState produced by executing receipts

We can't really do anything about 1) because there's no global congestion control, which means that the queue of incoming and delayed receipts is unbounded, so the size of source_receipt_proofs is unbounded as well :/
We'll have to live with this until global congestion is implemented.

With 2) the situation is better. We control which transactions get added to a chunk, so we could add a size limit for new transactions. In prepare_transactions there's already a gas limit and a time limit, we can add a similar size limit. Once we added transactions which take up more than X MB, we stop adding new ones. AFAU receipts produced by converting transactions should be rather small, so these receipts shouldn't be a big concern.
There's also local congestion control which helps a bit - it stops adding new transactions when the number of delayed receipts gets too high, but it doesn't really limit the size, we need an explicit size limit as well.

We can limit 3) by executing receipts until the PartialState gets too large. TrieRecorder records how much PartialState was produced when executing a receipt, and we can use this information to limit total size of PartialState. The easiest way would be to add a size_limit similar to the gas_limit and compute_limit - once PartialState gets too large, stop processing receipts and move them to the delayed queue:

if total_compute_usage < compute_limit {

I think this would be good enough for normal non-malicious traffic, but this kind of limit isn't enough by itself. In Jakob's analysis he found out that a single receipt can access as much as 36 million trie nodes, which would produce hundreds of megabytes of PartialState. This means that we also need a per-receipt limit - if executing a receipt produces more than X MB of PartialState, then the receipt is invalid and execution failed. Like the 300TGas limit.
This will be a breaking change, some contracts that worked before could break after introducing this limit, but I think it's necessary to add it, I don't see any way around it.

There's also the question of what the size limit itself should be - In Jakob's analysis he proposed 45MB, but that requires a significant amount of bandwidth - sending 45MB ChunkStateWitness to 30 validators would require at least 10 Gbit/s connection (!). We've already seen validators start having trouble with 16 MB witnesses, so this limit has to be chosen carefully. The limit also can't be too small, because it'd make the per-receipt size limit very small.

My rough plan of action would be:

  1. Use TrieRecorder to measure how much PartialState each receipt produces. Run some traffic and see what it looks like. Add metrics.
  2. Add a size_limit when applying receipts - add the basic limit which stops processing receipts when the size of PartialState gets too large. This could be enough to run mainnet traffic smoothly.
  3. Add a size limit for new transactions - stop adding transactions when they get too large.
  4. Implement per-receipt size limit on PartialState. This would require careful analysis - it'd be good to go over the blockchain and see if there're any contracts which require > 20MB of PartialState to run. Those could break after introducing the limit, so we must estimate what the impact of that would be, warn developers, etc.
  5. Adjust gas costs to reflect how much PartialState is produced by executing a receipt. Accessing trie nodes should be as expensive as the resulting size increase is.

@jancionear
Copy link
Contributor

A quick and hacky size limit example, stops applying receipts when the size of TrieRecorder goes above 5MB: jancionear@6dd9d4f

@shreyan-gupta shreyan-gupta self-assigned this Mar 4, 2024
github-merge-queue bot pushed a commit that referenced this issue Mar 7, 2024
This PR adds a new runtime config `state_witness_size_soft_limit` with
size about 16 MB to begin with along with an implementation to enforce
it in runtime.

This is the first step of #10259

What is state witness size soft limit?

In order to limit the size of the state witness, as a first step, we are
adding a limit to the max size of the state witness partial trie or
proof. In runtime, we record all the trie nodes touched by the chunk
execution and include these in the state witness. With the limit in
place, if the size of the state witness exceeds 16 MB, then we would
stop applying receipts further and push all receipts into the delayed
queue.

The reason we call this a soft limit is that we stop the execution of
the receipts AFTER the size of the state witness has exceeded 16 MB.

We are including this as part of a new protocol version 83

Future steps
- Introduce limits on other parts of the state witness like new
transactions
- Introduce a hard size limit for individual contract executions
- Monitor size of state witness
- Add metrics in a separate PR
@shreyan-gupta
Copy link
Contributor

shreyan-gupta commented Mar 8, 2024

Updating the project thread.

I've merged in PR #10703 which adds a soft limit for storage proof size as highlighted in point 3 of @jancionear comment. The next step I was thinking of pursuing was the hard limit for each contract as per the research work that had been done by Jakob. Based on that I had a conversation with Simonas.

Simonas suggested while this is totally doable, we should definitely consider the consequence of adding this restriction on contracts. Historically we've maintained the stance of having backward compatible contracts and adding this restriction can possibly cause some contracts to fail.

We should probably get some statistics on the size of data touched by contracts and (1) whether there are any existing contracts on mainnet already running that may break and (2) whether there are any historic/dormant contracts that may break.

(1) is easily doable as we can just add metrics to the mirrored mainnet traffic. Marcelo is the right point of contact for this. (2) on the other hand is quite a bit of work, but this too has been done in the past. I'm not personally sure whether the work is worth it for our case.

At the end of the day this also boils down to decisions by upper management and we should definitely keep Bowen in the loop and let him know the proposed changes. That said, we should definitely do our research before going to him. As next steps, I propose we add some metrics like P50, P99, P999, P100 to figure out what's the size of data touched by contracts, whether any contracts would break (probably not).

Technical side of things

  • runtime/near-vm-runner/src/logic/logic.rs is the file we need to take a look at
  • Within that storage_read function is the one that runtime uses to interacts with the trie storage and we can probably explore more to track the size of the storage touched and not just the node count.
  • Later, while implementing the hard limit, we can keep track of this, return a runtime error (or failed contract execution) if the hard limit is hit and charge the gas.
  • Simonas had mentioned we probably don't have metrics within logic.rs so we may have to expose the aggregated size as a return value from VM.

@walnut-the-cat
Copy link
Contributor

cc. @jancionear

@tayfunelmas tayfunelmas added the A-stateless-validation Area: stateless validation label Apr 1, 2024
@walnut-the-cat walnut-the-cat changed the title State Witness size limit [Tracking issue] State Witness size optimization Apr 10, 2024
@pugachAG pugachAG changed the title [Tracking issue] State Witness size optimization [Tracking issue] State Witness size limit Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

No branches or pull requests

7 participants