Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

pvf: Log memory metrics from preparation #6565

Merged
merged 16 commits into from
Feb 6, 2023

Conversation

mrcnski
Copy link
Contributor

@mrcnski mrcnski commented Jan 16, 2023

PULL REQUEST

Overview

This is a first step at mitigating disputes caused by OOM errors. Eventually, we would like to reject PVF's that surpass some memory threshold during compilation (preparation), while still in the pre-checking stage. We are not sure what the threshold should be at this time, so we are just getting data for now.

In particular, there are three measurements that seem promising:

  • max_rss (resident set size) from getrusage
  • resident memory stat provided by jemalloc
  • allocated memory stat also from jemalloc.

All have pros and cons, described in detail in the related issues.

See paritytech/polkadot-sdk#745 for more background, particularly this comment. Also see this issue starting at this comment.

Related issues

Closes #6317
See paritytech/polkadot-sdk#745
More background: https://github.com/paritytech/srlabs_findings/issues/110#issuecomment-1362822432

TODO

  • I need a little bit of help testing this. Is it possible to inspect the Metrics struct to see what has been logged? Mainly I'm interested to know if this is a good approach, and if there are existing tests that do something like this.

@mrcnski mrcnski added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit. labels Jan 16, 2023
@mrcnski mrcnski force-pushed the mrcnski/prechecking-threshold-memory branch from c327b91 to 61ebe72 Compare January 16, 2023 23:31
@@ -845,7 +851,7 @@ mod tests {
let pulse = pulse_every(Duration::from_millis(100));
futures::pin_mut!(pulse);

for _ in 0usize..5usize {
for _ in 0..5 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the unrelated change, but this really triggered me.

//! - `allocated` memory stat also from `tikv-malloc-ctl`.
//!
//! Currently we are only logging these, and only on each successful pre-check. In the future, we
//! may use these stats to reject PVFs during pre-checking. See
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually as in this initial phase, the goal is gathering data - we might extend this measurement to all preparation jobs for the time being. Reason: Otherwise we only get data on newly registered PVFs ... meaning if parachains are stable (no upgrades), it could take a while until we gathered enough data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh good point.

//
// 2. To have less potential loss of precision when converting to `f64`. (These values are
// originally `usize`, which is 64 bits on 64-bit platforms).
let resident_kb = (tracker_stats.resident / 1000) as f64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the issue with precision loss. Floating point does not really care about the scale as long as it fits in the exponent.

Also to double check - is it really kilo as in 1000 or is it 1024?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, it should be 1024. And I'll just remove the comment to avoid confusion.

@mrcnski
Copy link
Contributor Author

mrcnski commented Jan 26, 2023

By using jemalloc for the memory stats we will cause this user's builds to break again. I don't think we should delay this PR for that reason, but IMO we should start thinking about gating jemalloc behind a feature flag (as proposed on that issue), especially now that we are expanding its usage.

@mrcnski mrcnski requested a review from eskimor January 26, 2023 13:13
@eskimor
Copy link
Member

eskimor commented Jan 26, 2023

  • I need a little bit of help testing this. Is it possible to inspect the Metrics struct to see what has been logged? Mainly I'm interested to know if this is a good approach, and if there are existing tests that do something like this.

You should be able to gather metrics by running it via zombienet.

@eskimor
Copy link
Member

eskimor commented Jan 26, 2023

Closes paritytech/polkadot-sdk#745

That does not seem to be quite right. (Just a step towards)

/// For simplicity, any errors are returned as a string. As this is not a critical component, errors
/// are used for informational purposes (logging) only.
pub fn memory_tracker_loop(finished_rx: Receiver<()>) -> Result<MemoryAllocationStats, String> {
const POLL_INTERVAL: Duration = Duration::from_millis(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to sample the stats so often ?

Copy link
Contributor Author

@mrcnski mrcnski Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, probably not. I was thinking of execution, where jobs take 10-25 ms.

Maybe 500ms for preparation is fine? I suspect that in most cases the memory will just grow, and the final measurement will be the max, so polling too often wouldn't buy much.

I'm wondering now if an attacker could craft a PVF that causes compiler memory to spike very quickly and then go back down. A coarse interval wouldn't catch that. Sounds like a very specific and unlikely attack though, and anyway the max_rss metric would be a useful backup stat for that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely a concern. It should not be possible to bypass the pre-checking checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that kind of attack @eskimor? Maybe the interval could be randomized to be less predictable and thus less gameable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am dubious that could work: The sampling interval will likely stay in the milliseconds range, but allocating huge amounts of memory can be accomplished way faster. Therefore it would still be able to trick this. What we could do on top is track the overall amount memory ever allocated - and if that value changed "too much" during two samples, we could also mark the PVF as invalid.

Anyhow, we are talking about the PVF preparation here. So this is about crafting a PVF that makes the compiler use huge amounts of memory for only a very short while. Given that the compiler code is not controlled by an attacker, this might not even be possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if that value changed "too much" during two samples, we could also mark the PVF as invalid.

Interesting idea. Why not just check max_rss at the end -- it should give us the maximum memory spike as well. If that's too high, we reject, even if the memory tracker observed metrics in the normal range.

Given that the compiler code is not controlled by an attacker, this might not even be possible.

Yep, I wondered the same thing above. Certainly for the purposes of gathering metrics right now it's not a concern. And later, maybe we can just rely on max_rss, if we find out it's not a useless stat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better way to handle this would be to not poll and instead set up a cgroup with an appropriate limit and subscribe to an event once that limit is breached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks! I'll extract that suggestion into a follow-up issue. I'll keep the polling for now, for the purposes of gathering metrics, as we don't yet know what the limit should be.

@eskimor
Copy link
Member

eskimor commented Jan 27, 2023

Let's run this on Versi and then merge.

@mrcnski
Copy link
Contributor Author

mrcnski commented Feb 2, 2023

I found out that max_rss does actually work on MacOS, and was able to test it locally with zombienet and got a metric:

https://pastebin.com/raw/CH41UfHt

However, I had to use RUSAGE_SELF, as RUSAGE_THREAD is not supported. Anyway, I'll be pushing a change enabling this metric on Linux and Mac but not Windows.

@mrcnski
Copy link
Contributor Author

mrcnski commented Feb 2, 2023

On Versi the metrics are:

  • max_allocated: ~80mb
  • max_resident: ~90mb
  • max_rss: ~1gb (!)

@koute
Copy link
Contributor

koute commented Feb 3, 2023

On Versi the metrics are:

  • max_allocated: ~80mb

  • max_resident: ~90mb

  • max_rss: ~1gb (!)

All of these seem kind of low, but I guess if this is only for the PVF precheck then it could make sense. And RSS being significantly higher than metrics returned by jemalloc is perfectly normal.

Would be interesting to do some more detailed profiling of this if/when I find the time.

@mrcnski
Copy link
Contributor Author

mrcnski commented Feb 6, 2023

I found out that max_rss does actually work on MacOS [...] However, I had to use RUSAGE_SELF, as RUSAGE_THREAD is not supported.

Oh, I realized that RUSAGE_SELF can't work, because the same process is responsible for multiple jobs. And I couldn't find any way to "reset" the max_rss between jobs. (Edit: I found this but again, it's Linux-only.) So I'll revert this change again (I guess we just can't support MacOS), run a quick zombienet test for sanity, and then merge.

@mrcnski mrcnski changed the title pvf: Log memory metrics from prechecking pvf: Log memory metrics from preparation Feb 6, 2023
@mrcnski
Copy link
Contributor Author

mrcnski commented Feb 6, 2023

bot merge

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log ru_maxrss from getrusage for PVF workers
5 participants