Ensure pulls are reliable with lower resources (memory, network speed) #2732

hgroover · 2022-10-10T20:37:54Z

First unusual event: ostree pull (using libcurl, https) is reaped by the oom-killer on a very large delta pull.
Kernel version 4.9.217 mips
Total available physical memory 85mb (physical DRAM is 128mb, minus a large CMA chunk, minus kernel and wpa_supplicant).

For the most part, I am able to successfully run ostree pull. However, if the total quantity of deltas is high, ostree is likely to use up all available memory and get reaped by the oom-killer.
This happens less frequently if the thread limits in ostree-repo-private.h are set to lower values, but is still guaranteed to happen on a large set of deltas. I am currently investigating running under ulimit as a workaround but have to ask if there might be a better way to avoid this?

In the usual case I get an errorlevel 137 in a script and can retry the ostree pull repeatedly until it succeeds. In the worst case I will get an oom-killer murder spree that begins terminating every process on the system, and while looking for victims an I/O thread fails and results in ubifs corruption, resulting in /ostree left in a read-only state. In short, I need to avoid oom-killer deaths. I have very little memory on this system and enough NAND to pull the update (but not enough for a second rootfs).

jlebon · 2022-10-11T14:32:04Z

Interesting specs. What values did you try for e.g. _OSTREE_MAX_OUTSTANDING_FETCHER_REQUESTS? Relatedly, I think it could make sense to make that into a config knob. Also, to confirm, is the remote using static deltas? If so, try with --disable-static-deltas` too.

hgroover · 2022-10-11T16:29:29Z

The three values I've been changing are
_OSTREE_MAX_OUTSTANDING_FETCHER_REQUESTS 3
_OSTREE_MAX_OUTSTANDING_DELTAPART_REQUESTS 1
_OSTREE_MAX_OUTSTANDING_WRITE_REQUESTS 3

The write requests limit was changed to 3 in a release soon after 2020.3 (I've gone through everything up to and including 2022.2 - I think it was 2022.3 that went to a newer GLib version (and I'm running a very old uclibc build so updating GLib is going to be a long process).
I've also used 4 for fetcher requests.
What I've tried to do is limit threads because part of my problem seems to be that on the kernel version I have, the oom-reaper has a cadence of dumping about one candidate thread per second regardless of the VM setting for oom-killer dump, and if there are enough threads it SOMETIMES creates the conditions for an I/O thread to run out of memory before any process can get reaped - in short, a kind of oom-killer double-fault situation.

I'll try the --disable-static-deltas also. I have lots of logs including the /var/log/messages from the oom reaper. Thanks!

hgroover · 2022-10-13T09:31:57Z

I actually ran into problems with the FETCHER_REQUESTS 3 setting. 4 or 5 seems to work better, 3 will sometimes cause errors and early termination.
I think my problem with oom-killer deaths is simply that ostree crosses a 10mb RSS size. I just need to adjust the vm settings. There's actually quite a lot of memory available (as I've killed all nonessential processes).
However, --disable-static-deltas makes a measurable difference also. Thanks for that. I will follow up once I've done more testing with modified vm settings.

hgroover · 2022-10-17T15:55:03Z

Instrumenting the code has given me some insight into the problem. It's actually not a problem which is endemic to ostree but in this case ostree definitely exacerbates the problem.
In all cases I'm running with --disable-static-deltas (because I have none).
When I get up around 1100 to 1300 fetch requests completed (with a total of over 1500 in the queue) I start running into the oom-killer death situation. Interestingly when oom-killer targets ostree, I need to wait before retrying in my script. If I wait at least 5 seconds before trying to run ostree pull again, I am likely to have success, otherwise I'll get an oom-killer death straight away.
I've experimentally patched in an operation-ceiling option that allows me to terminate after a certain number of completed fetches, and an anti-flood option that allows me to optionally inject usleep() calls between fetches. Ideally there's a way to add sleep when the number of fetches in-flight exceeds a certain amount, but I'm approaching these changes with some caution.
Attached is --verbose output with some of my modified debug messages showing the number of completed and queued fetches. The exit from the operation-ceiling option currently triggers an assert which is not intentional but this is just to show the volume and rate. I expect this is fairly normal on large change sets on systems with more memory, and it's also quite possible this is exacerbating a kernel issue which was addressed after the 4.9.217 kernel I'm running.
Interestingly running without --verbose I am still likely to run afoul of the oom-killer situation, which suggests that the rate of I/O requests may also be to blame, thus the introduction of the anti-flood option.
fwupdate-20221016-155642-1-output-dd6ea0b8b3f80a493d2142aa5e6bb8ad9f101054e3eb08b85276d91a5811c51c.log

hgroover · 2022-10-20T15:16:03Z

I've finished my investigation and concluded that there are a number of conditions required to manifest this particular problem:

You must have a system with very little available memory, as described above
You must have a large changeset - no static deltas but I had around 1800 filesystem objects in the breaking case
A significant number of filesystem objects must be large files with sizes greater than 100k. Two were just above 8mb.

The oom-killer death is not directly related to anything ostree is doing, but because fetch requests pile up quickly in the queue while larger fetches are taking place, we seem to get enough heap fragmentation that uncommitted pages get pulled in and the I/O activity causes available memory to finish very quickly. The attached patch applies to 2020.3 and works around the problem in such extreme cases by implementing three options:
--large-file-size=<threshold-size-in-bytes> treats files with size >= threshold size specially; between completion of fetch and completion of write, the large file blocks all other fetch queuing and servicing
--operation-ceiling=<count> limits the number of entries that are queued to a specified count. This triggers an assert when exiting from the main entry point which results in an exit code of 134 (there is certainly going to be a better way to do this but this approach worked for my purposes). This indicates that another pull attempt is needed (for use within a script wrapper).
--anti-flood=<delay-ms> injects a usleep delay at critical junctures, otherwise we tend to still get oom-killer deaths if not running with --verbose

It's really a horrendous kluge but allows me to run to completion without playing oom-killer roulette. I doubt anyone else has run into this sort of extreme case and there's probably a more elegant way to handle blocking on large requests. I am limited to using 2020.3 due to the signing methods available but I'm open to suggestions on the best way to make this available. In any case here is the patch.

0006-hard-pull-limit.txt

cgwalters · 2022-11-14T18:23:59Z

Yeah, we clearly need backpressure here. Your patch is AFAICS implementing this by blocking the main thread, which probably works well enough for unattended device updates but anything linking libostree and doing anything else on that thread may be surprised at blocking. (Specifically e.g. on a tty we'll stop rendering fetch progress)

Hmm. This issue is really an extension of the problems with enqueuing too many http requests that caused us to add queuing for the HTTP fetching in c18628e

Do you have a handy reproduction scenario for this? Does

diff --git a/src/libostree/ostree-repo-pull.c b/src/libostree/ostree-repo-pull.c
index 86b4358a..05fecdb8 100644
--- a/src/libostree/ostree-repo-pull.c
+++ b/src/libostree/ostree-repo-pull.c
@@ -150,6 +150,7 @@ static void enqueue_one_static_delta_superblock_request_s (OtPullData          *
                                                            FetchDeltaSuperData *fetch_data);
 static void enqueue_one_static_delta_part_request_s (OtPullData           *pull_data,
                                                      FetchStaticDeltaData *fetch_data);
+static void ensure_idle_queued (OtPullData *pull_data);
 
 static gboolean scan_one_metadata_object (OtPullData                 *pull_data,
                                           const char                 *checksum,
@@ -385,6 +386,8 @@ check_outstanding_requests_handle_error (OtPullData          *pull_data,
           g_free (checksum);
         }
 
+      /* Finally, if we still have capacity, scan more metadata objects */
+      ensure_idle_queued (pull_data);
     }
 }
 
@@ -461,6 +464,9 @@ ensure_idle_queued (OtPullData *pull_data)
   if (pull_data->idle_src)
     return;
 
+  if (fetcher_queue_is_full (pull_data))
+    return;
+
   idle_src = g_idle_source_new ();
   g_source_set_callback (idle_src, idle_worker, pull_data, NULL);
   g_source_attach (idle_src, pull_data->main_context);

help?

We added backoff/queueing for fetching via HTTP, but we have another queue in the metadata scanning which can also grow up to the number of outstanding objects, which can be large. Capping the scanning operation when we have hit our operation limit will avoid potentially large amounts of allocations in the case of e.g. a slow network. Closes: ostreedev#2732

cgwalters · 2022-11-14T19:10:06Z

OK filed as #2766

hgroover · 2022-11-17T16:17:26Z

Thanks for the feedback, I wanted to follow up to say that I'm going to have a look at the suggested changes.
As far as my patch, it's really a hack and of course when it comes back to the main thread it is going to exit with an assert because I haven't set an error condition if I'm exiting early. I still get oom-killer deaths occasionally but it's very significantly reduced.
As far as a reproduction scenario, this is on a MIPS device running a 4.9 kernel with a total of 128mb DRAM. A significant amount is reserved at boot time for CMA so what I actually have available at runtime after unloading video drivers, etc. is around 70mb.
When I've observed the oom-killer death taking place, basically we had a large number of reads queued up, and at least one or two of them were single filesystem objects over 8mb in size (for example, the kernel image). We would very quickly queue up more reads with signature verification and write pending, then run out of memory in pending I/O operations which would lead to filesystem corruption (this is not every oom-killer death, just the ones that result in leaving /ostree in an unusable state). By the time the app gets killed, we already have a corrupted filesystem. Since this happens on the pull, the current deployment is usable but I have no way to recover /ostree and do a successful pull.
I'm currently stuck on 2020.3 due to build issues on buildroot, but I'll try backporting the commit to see if that makes a difference. I can reproduce this with the right combination of updates.

Closes #2732

ericcurtin · 2023-05-02T14:38:34Z

There is an article online that suggests ostree memory usage can be quite high:

https://witekio.com/de/blog-de/ostree-tutorial-system-updates/

do we have any hard data in a blog post that describes what kind of memory is typically required in an el9 ostree-based system for upgrades? If no, should we write one?

cgwalters · 2023-05-02T14:48:27Z

We don't have any hard data. In #2766 I found what I believe is one source of memory use proportional to O(files).

I think we can repurpose this issue for gathering data and metrics, and ideally scaling our queues roughly to available resources (CPU, network speed, I/O bandwidth, memory). We definitely don't want any unbounded queues.

cgwalters mentioned this issue Nov 14, 2022

fetcher: Avoid too large queues for metadata processing #2766

Merged

jlebon closed this as completed in #2766 Dec 1, 2022

jlebon added a commit that referenced this issue Dec 1, 2022

Merge pull request #2766 from cgwalters/avoid-endless-object-queueing

ad96b8a

Closes #2732

cgwalters reopened this May 2, 2023

cgwalters changed the title ~~ostree pull dies via oom-killer on low-memory system~~ Ensure pulls are reliable with lower resources (memory, network speed) May 2, 2023

cgwalters added triaged This issue has been evaluated and is valid enhancement difficulty/medium medium complexity/difficutly issue reward/medium Fixing this will be notably useful labels May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure pulls are reliable with lower resources (memory, network speed) #2732

Ensure pulls are reliable with lower resources (memory, network speed) #2732

hgroover commented Oct 10, 2022

jlebon commented Oct 11, 2022

hgroover commented Oct 11, 2022

hgroover commented Oct 13, 2022

hgroover commented Oct 17, 2022

hgroover commented Oct 20, 2022 •

edited

Loading

cgwalters commented Nov 14, 2022

cgwalters commented Nov 14, 2022

hgroover commented Nov 17, 2022

ericcurtin commented May 2, 2023 •

edited

Loading

cgwalters commented May 2, 2023

Ensure pulls are reliable with lower resources (memory, network speed) #2732

Ensure pulls are reliable with lower resources (memory, network speed) #2732

Comments

hgroover commented Oct 10, 2022

jlebon commented Oct 11, 2022

hgroover commented Oct 11, 2022

hgroover commented Oct 13, 2022

hgroover commented Oct 17, 2022

hgroover commented Oct 20, 2022 • edited Loading

cgwalters commented Nov 14, 2022

cgwalters commented Nov 14, 2022

hgroover commented Nov 17, 2022

ericcurtin commented May 2, 2023 • edited Loading

cgwalters commented May 2, 2023

hgroover commented Oct 20, 2022 •

edited

Loading

ericcurtin commented May 2, 2023 •

edited

Loading