Pre-checking: reject PVFs where preparation surpassed threshold memory usage #745

mrcnski · 2022-12-22T17:49:43Z

ISSUE

Overview

This is an attempt at mitigating disputes caused by OOM errors.

We do retry now on OOM errors, but that does not really resolve the issue. In any case, this should be resolved via changes to pre-checking. E.g. by not accepting a PVF where the preparation consumed more than X amount of memory. Where X is low enough that it should work on any validator which adheres to spec recommendations.

There are two open questions I have:

Strategy

We are already using tikv-jemalloc for all processes. We should be able to use tikv-jemalloc-ctl to get memory stats in the child worker, as we do here.

I'm thinking we would need another thread that runs in the background and repeatedly polls the memory usage during preparation. At the end, the thread would return the peak observed memory usage.

Question: I'm guessing the allocated stat is what we need here, though it's not well-documented.

Memory threshold

Referring to the quote from the overview:

not accepting a PVF where the preparation consumed more than X amount of memory. Where X is low enough that it should work on any validator which adheres to spec recommendations.

Question: What should we use as the value of X? What are the spec recommendations?

The text was updated successfully, but these errors were encountered:

mrcnski · 2022-12-22T22:09:16Z

@eskimor Just to clarify something: this doesn't address OOM due to execution (i.e. AmbiguousWorkerDeath), right?

We do retry now on OOM errors, but that does not really resolve the issue. In any case, this should be resolved via changes to pre-checking.

This kind of implies OOM due to execution, as we only retry on AmbiguousWorkerDeath, but the proposed mitigation is only for OOM due to preparation, unless I'm missing something. At first I thought that this would supersede #767.

mrcnski · 2023-01-12T23:10:30Z

What are the spec recommendations?

I found them here.

For the value of X in the ticket (threshold memory usage), does it make sense to set it really liberally, and get metrics on real usage in the meantime?

@eskimor After the discussion in the other issue, it seems to me that the strategy as described here is still viable. The extra thread would only be needed during pre-checking, and (I think) we would get more accurate statistics than from max_rss. Do you think I can get started soon, or is more discussion needed?

eskimor · 2023-01-13T10:48:19Z

I don't think much more discussion is needed. We can definitely start gathering data and start with a high limit. If we limit max_rss during pre-checking enough that OOM is virtually impossible if adhered to, then I believe all issues are resolved.

So in summary:

We gather data on actual memory usage.
We limit max_rss during prechecking to something significantly lower than the memory that should be available - there should be at least a factor of 10.
If preparation runs out of memory during preparing, we can then be very sure that it is a transient issue and retrying will very likely resolve it.

eskimor · 2023-01-13T14:34:39Z

Ok, sorry the actual measurement of used memory is still not clear.

The two most promising options so far: Reading rss usage (should be physical RAM usage) or tracking actual allocations. Both are not perfect:

Measuring RSS

Measuring RSS is not ideal if nodes have swap enabled as then they would be allowed to use more "memory", than nodes not having swap enabled. Also measurement will be more platform dependent. The risk being: Nodes have swap enabled and are under memory pressure, so they are indeed swapping, hence they would allow more allocation than the limit is supposed to allow.

I am having trouble to believe that this will ever be a problem in practice though. It would mean that a significant number of validators have:

swap enabled
are under extreme memory pressure

memory pressure has to be pretty high, because I expect the preparation process to be quite short lived, so it is unlikely to get swapped out. Especially if we have a rather low limit, then it is even more likely that the limit will hit, before swapping can occur.

Tracking allocations/free

Sounds great, would avoid the swap usage problem, but suffers from a different problem: Allocations are allocating purely virtual memory, whether or not those pages become physical memory depends on the usage.

The delta is very application dependent. For many applications it will likely be negligible. Especially because it is per page. So it does not matter that you allocate a vec of 100 bytes and only use ten. The effect will only be noticeable if you have memory regions in the ballpark of pages. E.g. a vec of size 3k, but you only end up using 1k ... those things will make a difference.

Something else to consider: The usage depends on the compiler code and can not be influenced by an adversary and even if an attacker managed to make the compiler allocate lots of virtual only memory - it would just make its own code rejected. So no value in that. What would be more troublesome, if honest code just hits an edge case causing the compiler to allocate lots of virtual memory, than it would get rejected for no good reason.

Summary and Solution

Tracking RSS might underestimate, if swap is used. Tracking allocations might overestimate if lots of memory gets allocated, but not fully used.

Over estimation sounds safer, if actual usage by the compiler does not lead to too much virtual only allocation, this might be fine! But it also might not, depending on the actual implementation we could have an overestimate being too high to be useful.

Moving forward: Let's get data! I would implement measurement of at least one of the two options, or if easy enough even both and get metrics for them. Given that both metrics are not perfect, having both seems like a good idea to me to sanity check either measurement.

The one delivering more sensible results can then be used for limiting, but I would still keep the other measurement around. If we ever have problems with the limit, we would then have another data point that could help a great deal to figure out what is wrong.

* remove redundant gas checks * fmt * fix borrowing issue

mrcnski self-assigned this Dec 22, 2022

mrcnski mentioned this issue Dec 27, 2022

Log ru_maxrss from getrusage for PVF workers paritytech/polkadot#6317

Closed

mrcnski mentioned this issue Jan 16, 2023

pvf: Log memory metrics from preparation paritytech/polkadot#6565

Merged

1 task

mrcnski added the T4-parachains_engineering label Jan 25, 2023

mrcnski mentioned this issue Mar 1, 2023

Track PVF preparation memory with a global allocator wrapper #718

Closed

s0me0ne-unkn0wn mentioned this issue Mar 14, 2023

PVF: Document that preparation cannot lead to disputes paritytech/polkadot#6873

Merged

mrcnski mentioned this issue Apr 17, 2023

Create complete and exhaustive list of sources of indeterminism in PVF #653

Open

s0me0ne-unkn0wn mentioned this issue May 24, 2023

PVF: Refactor workers into separate crates, remove host dependency paritytech/polkadot#7253

Merged

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added T8-parachains_engineering and removed T4-parachains_engineering labels Aug 25, 2023

ordian mentioned this issue Sep 28, 2023

Tracking/limiting memory allocator #1192

Merged

mrcnski assigned s0me0ne-unkn0wn Oct 19, 2023

the-right-joyce removed the T8-parachains_engineering label Oct 23, 2023

helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024

Remove redundant gas checks in precompiles (paritytech#745)

da3f879

* remove redundant gas checks * fmt * fix borrowing issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-checking: reject PVFs where preparation surpassed threshold memory usage #745

Pre-checking: reject PVFs where preparation surpassed threshold memory usage #745

mrcnski commented Dec 22, 2022

mrcnski commented Dec 22, 2022

mrcnski commented Jan 12, 2023

eskimor commented Jan 13, 2023

eskimor commented Jan 13, 2023 •

edited

Loading

Pre-checking: reject PVFs where preparation surpassed threshold memory usage #745

Pre-checking: reject PVFs where preparation surpassed threshold memory usage #745

Comments

mrcnski commented Dec 22, 2022

ISSUE

Overview

Strategy

Memory threshold

mrcnski commented Dec 22, 2022

mrcnski commented Jan 12, 2023

eskimor commented Jan 13, 2023

eskimor commented Jan 13, 2023 • edited Loading

Measuring RSS

Tracking allocations/free

Summary and Solution

eskimor commented Jan 13, 2023 •

edited

Loading