Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive swapping or limited swap space making VMs unresponsive #6659

Open
ddevz opened this issue Jun 1, 2021 · 27 comments
Open

Excessive swapping or limited swap space making VMs unresponsive #6659

ddevz opened this issue Jun 1, 2021 · 27 comments
Labels
affects-4.2 This issue affects Qubes OS 4.2. C: other needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@ddevz
Copy link

ddevz commented Jun 1, 2021

The problem you're addressing (if any)
2 problems:

  • Going to a VM that has been running for a while, and have to wait 30+ seconds to get a response because it started swapping while you weren't using it.
  • Going to a VM that has been running for a while, and it is totally unresponsive because it started swapping then filled swap while you weren't using it.

Describe the solution you'd like
When clicking the qubes "Q" icon, it shows the memory used for each VM from a memory balancing perspective. One could put a memory number there.
Exactly what number would be best to put there is not entirely clear, as memory stats as displayed in the "free -h" command could be used to show free memory inside the VM devided by available memory inside the VM.
However, due to memory balancing, this is not the whole story. Perhaps free memory inside the VM (as can be displayed by the "free" command) devided by the max memory the memory balancer is willing to provide it?

Where is the value to a user, and who might that user be?
Not having to wait 30+ seconds for responces, and not having to loose data that you hadn't gotten off a disposable VM yet because that VM it is entirely unresponsive.

Describe alternatives you've considered
Normally one would put a system monitor applet in the task bar that shows memory consumption. However, it is unlikely this is a usable solution for qubes because there would end up too many monitors in the task bar and for security reasons

Additional context

Relevant documentation you've consulted

Related, non-duplicate issues

@ddevz ddevz added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. labels Jun 1, 2021
@andrewdavidwong
Copy link
Member

When clicking the qubes "Q" icon, it shows the memory used for each VM from a memory balancing perspective. One could put a memory number there.

How is this different from the memory values the Qube Manager already shows for each VM?

@andrewdavidwong andrewdavidwong added C: other help wanted This issue will probably not get done in a timely fashion without help from community contributors. ux User experience labels Jun 1, 2021
@andrewdavidwong andrewdavidwong added this to the TBD milestone Jun 1, 2021
@andrewdavidwong
Copy link
Member

I have personally never experienced this problem, nor can I recall hearing others report it. Is it because Qubes is installed on an HDD rather than an SSD?

In any case, implementing a tool for users to manually monitor the problem sounds much less promising than finding the underlying root cause and fixing it.

@andrewdavidwong andrewdavidwong changed the title Easy way to determine how close VMs are to swapping Excessive swapping making VMs unresponsive Jun 1, 2021
@andrewdavidwong andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. and removed T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. help wanted This issue will probably not get done in a timely fashion without help from community contributors. ux User experience labels Jun 1, 2021
@andrewdavidwong andrewdavidwong modified the milestones: TBD, Release 4.0 updates Jun 1, 2021
@ddevz
Copy link
Author

ddevz commented Jun 1, 2021

The underlying cause is opening "too many" firefox tabs in a VM because you don't know how many is "too many" (I.E. how much free memory is left inside the machine). The same goes for "too many" applications running in a VM but firefox (and especially torbrowser) are more likely to bloat and become unresponsive while you are away.

Increasing the amount of memory does not help because it just increases the number of tabs (or applications) you can have open, the user still doesn't know when they need to stop.

@ninavizz
Copy link
Member

ninavizz commented Jun 1, 2021

@ddevz We actually did propose some of what you're suggesting, in two prototypes for a new App Menu custom built for Qubes, tracked in #5677. In the survey to gauge user sentiment regarding feature options (#6573), this feature measured very well with over 2/3 of users responding positively to it. It's not prioritized for an initial release, though, as our goal for that release is to just get the fully-new widget onto users machines for feedback from use.

The data for what those prototypes propose, only mirrors what the existing Domains widget (In 4.0 it's the Q menu on the right part of the screen, in the Tray area) provides. Do those existing metrics provide the insight you're looking for (but obviously presented more contextually so you have more natural visibility into it)?

@ddevz
Copy link
Author

ddevz commented Jun 1, 2021

The data for what those prototypes propose, only mirrors what the existing Domains widget (In 4.0 it's the Q menu on the right part of the screen, in the Tray area) provides. Do those existing metrics provide the insight you're looking for (but obviously presented more contextually so you have more natural visibility into it)?

If I understand your question properly, then no. Those metrics do not provide sufficient insight. The 4.0 "Q" widget on the right shows how much memory the memory balancer has allocated to each VM. This is valuable information as it's part of the question "how many more VMs can I create before I cant start VMs anymore (due to being out of memory)?" and "which VMs do I need to kill to free up enough memory to start the VM I want to start?".
In this issue I'm trying to answer questions like "how many more tabs can I open in this specific VM before the machine starts swapping/runs out of swap?"
So the metrics one would need are:

  1. The max memory that the memory balancer is willing to give that VM. A example of this metric would be if instead of the "Q" widget saying "400 MB" for a VM it could say "400 of 3983 MB" or "400/3983 MB".
    Notes:
  • this information is also useful for people trying to manage the available xen memory as well
  • I suspect this information would be easy to get to the GUI
  • This information does not appear to be available inside the virtual machine, as running the "free" command inside the virtual machine shows the "total memory" changing over time as the memory balancer does its balancing.
  1. The amount of memory actually used inside that VM, (or something that could be used to compute the amount of memory used like the "free" and "total" numbers). (Note:I suspect this information would be harder to get to the gui)

@brendanhoar
Copy link

brendanhoar commented Jun 1, 2021

I have personally never experienced this problem, nor can I recall hearing others report it. Is it because Qubes is installed on an HDD rather than an SSD?

I've run into this quite often on SSD in VMs set up primarily for research that tends to require a lot of open tabs. If I also have a few other VMs open, memory pressure can lead to that VM being stuck in perma-swap and unresponsive. It would be nice, particularly for browser heavy VMs based on Linux templates, to be able to control the (max) size of the volatile volume and the swap size from the qubes settings, and have the VM boot process utilize those values.

[Above is responsive to your question...but stopping there to return to topic.]

@andrewdavidwong
Copy link
Member

It would be nice, particularly for browser heavy VMs based on Linux templates, to be able to control the (max) size of the volatile volume and the swap size from the qubes settings, and have the VM boot process utilize those values.

If I'm understanding correctly, it sounds like we have two different (non-mutually-exclusive) proposals that both aim to address the same underlying problem: the "memory monitoring" approach and the "swap config" approach.

@ddevz
Copy link
Author

ddevz commented Jun 1, 2021

I do not see how "swap config" would solve the problem. Can you elaborate how one might use swap config to resolve the issue? I'm totally speculating here... maybe your thinking of setting swap to zero so the process dies instead of the VM becoming unresponsive?

@brendanhoar
Copy link

brendanhoar commented Jun 1, 2021

I do not see how "swap config" would solve the problem. Can you elaborate how one might use swap config to resolve the issue? I'm totally speculating here... maybe your thinking of setting swap to zero so the process dies instead of the VM becoming unresponsive?

It appears that in the Qubes standard Linux templates, all swap is hard coded to "up to 1GB" per VM on the volatile volume.

This is different than traditional settings for Linux installs, e.g. from the Fedora 28 guide, I see:
image

So, in the above, if I have a VM set up with 8GB of RAM (whether 8/8 without memory sharing or 0.8/8 with), the traditional swap size suggested, were it not a VM, is between 4GB and 12GB of swap. Not 1GB.

I'll stop digging there because what is really needed is user behavior-focused testing of the interactions between Xen, memory sharing, and swap, for image/memory hungry applications such as Firefox to reach the right balance of settings, esp. under system-wide memory pressure.

The lack of user-exposed local swap values in the VM settings makes experimentation more difficult. I suppose we could perform some initial testing with swap files on the private volumes in the short term (with caveats).

Also, my theory, which may be wrong, is that memory pressure, with limited swap space, is causing the issue. This is different than your theory which is that swapping itself is causing the issue.

B

@ddevz
Copy link
Author

ddevz commented Jun 1, 2021

Also, my theory, which may be wrong, is that memory pressure, with limited swap space, is causing the issue. This is different than your theory which is that swapping itself is causing the issue.

I understand now! my theory is that swapping itself is causing the "have to wait 30+ seconds to get a response" problem, and that running out of swap space is causing the "totally unresponsive (I.E. can't use the terminal anymore and have to kill the VM)" problem.

If I am correct, then your idea of expanding the swap space could fix the "totally unresponsive" problem, but not the "have to wait 30+ seconds" problem.
(and note that when having to wait 30+ seconds, a user won't be able to stand opening many more tabs and launching many more applications)

If running out of swap space is causing both problem 1 and 2, then expanding the swap space would fix neither as the user would continue to open tabs/launch applications until the same thing happened.

However, being able to expand the swap space does sound important for users with limited overall system memory (I.E. the total memory that xen gets to use), when that user wants to run VMs that will go beyond what their system memory can handle. (I added extra memory to my system and had forgotten about this case :) )

So to me it sounds like we solve separate (but related) problems that lead to the same symptoms.

@marmarek
Copy link
Member

marmarek commented Jun 1, 2021

Adding more swap to a VM that runs Firefox avoids also "have to wait 30+ seconds" in most cases. This helps for Firefox specifically, because a lot of memory allocated by it is never released (aka memory leaks). While ideally applications wouldn't leak that much memory (which is hard, in such complex thing like a web browser...), adding more swap helps to mitigate the issue. Or at least significantly delay it.

Each VM has a 10GB volatile volume (/dev/xvdc). By default, only 1GB is assigned for swap, and the rest is unused. It is reserved for a copy-on-write layer over read-only root filesystem, but nowadays VM sees read-write root filesytem (the copy-on-write layer is applied at dom0 level). We might use it again, though, as part of #1293 and #904. Until then, there is /dev/xvdc3, which is the unused space - you can easily do sudo mkswap /dev/xvdc3 && sudo swapon /dev/xvdc3 to use it for swap. Maybe add it to /rw/config/rc.local. If not the plan to start using most of volatile volume again, we could have an option in VM settings for that.

A word of caution: using big swap does help for Firefox, which leaks a lot of memory. But it helps only because most of that memory is not accessed anymore. If an application really uses a lot of memory, then having more swap will actually make the situation worse - if more swap is used (and accessed frequently), it will make the application even slower, resulting in unresponsiveness much longer than 30s.

@DemiMarie
Copy link

Adding more swap to a VM that runs Firefox avoids also "have to wait 30+ seconds" in most cases. This helps for Firefox specifically, because a lot of memory allocated by it is never released (aka memory leaks). While ideally applications wouldn't leak that much memory (which is hard, in such complex thing like a web browser...), adding more swap helps to mitigate the issue. Or at least significantly delay it.

I thought Firefox had fixed its leaks many years ago 😞.

@andrewdavidwong andrewdavidwong changed the title Excessive swapping making VMs unresponsive Excessive swapping or limited swap space making VMs unresponsive Jun 2, 2021
@jamke
Copy link

jamke commented Jun 5, 2021

I thought Firefox had fixed its leaks many years ago 😞.

I believe Firefox did fix memory leaks (at least majority).
Maybe because they partly migrated to a proper language (Rust) that prevents majority of memory-related issues, including de-allocations.

As for today, Firefox consumes way less memory than Chromium/Chrome for having the same amount of tabs loaded and etc.
On a normal PC with GNU/Linux or Windows.

Maybe the problem is actually that Firefox allocates too much memory because it sees that a lot of memory is available (to be faster and utilize this extra free memory), and does not free it aggressively as it may be reused by itself.

@ddevz
Copy link
Author

ddevz commented Jun 6, 2021

I had been assuming that the memory leaks were in the javascript of the various pages you have open, as it seems highly dependent as to what sites you have open.

@andrewdavidwong andrewdavidwong added the eol-4.0 Closed because Qubes 4.0 has reached end-of-life (EOL) label Aug 5, 2023
@github-actions
Copy link

github-actions bot commented Aug 5, 2023

This issue is being closed because:

If anyone believes that this issue should be reopened and reassigned to an active milestone, please leave a brief comment.
(For example, if a bug still affects Qubes OS 4.1, then the comment "Affects 4.1" will suffice.)

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2023
@andrewdavidwong andrewdavidwong removed the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Aug 5, 2023
@horizon2021
Copy link

This still is a problem, and it happens quite a lot. Freezing the VM to death if you do not check for a restart of Firefox and free space. swapoff does not help, as VM still can get unresponsive over time. Please advise.

@DemiMarie DemiMarie reopened this Sep 26, 2024
@horizon2021
Copy link

This still is a problem, and it happens quite a lot. Freezing the VM to death if you do not check for a restart of Firefox and free space. swapoff does not help, as VM still can get unresponsive over time. Please advise.

Am using a relatively recent install of the 4.2.2 image. PC is limited with 16 GB RAM, with all the rest of active VM's I tried to make it work with 2,1GB of RAM. The VM's that get stuck all have 30-50 tabs open without Java active. Turning off swap helped insofar that you would notice running into the bottleneck by response time and could end firefox before, but that does not always work -> when it happens to disposable VM's you loose your work, since no option to restart. With all the VM's I usually uncheck: "include in memory balancing", which probably is a factor. Same use-case on a live systems (porteus) with 8GB RAM / no swap, there it appears to be never an issue, but that much I can not assign.

@marmarek
Copy link
Member

With all the VM's I usually uncheck: "include in memory balancing"

Do you also adjust their memory size ("initial memory")? The default for initial memory is 400MB which is definitely not enough to run Firefox or similar, and without memory balancing, VM will not get more RAM.

@horizon2021
Copy link

With all the VM's I usually uncheck: "include in memory balancing"

Do you also adjust their memory size ("initial memory")? The default for initial memory is 400MB which is definitely not enough to run Firefox or similar, and without memory balancing, VM will not get more RAM.

Yes. The VM's that are the problem for me are Disposable VM's, with the last attempt I set "initial memory" to 2100 MB in the "default-dvm" with the hope it will do. Kind of a habit to uncheck "include in memory balancing" .. only really thinking of it now .. would have to see how it turns out, but since the RAM is rather limited for me and using 10+ VM's of different RAM sizes at the same time, chance is that the swap space likely still fills up over time, which was the original issue.

@marmarek
Copy link
Member

Well, swap will fill up if VM has less RAM than applications running inside try to use. Yes, web browsers are extremely memory hungry apps. Check top to see what is using how much memory.

@horizon2021
Copy link

I've run into this quite often on SSD in VMs set up primarily for research that tends to require a lot of open tabs. If I also have a few other VMs open, memory pressure can lead to that VM being stuck in perma-swap and unresponsive. It would be nice, particularly for browser heavy VMs based on Linux templates, to be able to control the (max) size of the volatile volume and the swap size from the qubes settings, and have the VM boot process utilize those values.

Also, my theory, which may be wrong, is that memory pressure, with limited swap space, is causing the issue. This is different than your theory which is that swapping itself is causing the issue.

One of the questions was if it can be helped with adjustable swap space, since others have encountered it (older qubes versions but same issue) all with -presumably- "include in memory balancing" enabled - I feel that 1GB of swap may indeed be to little on some VM's (Firefox use) and you would run into it over along enough period of time when you are not able to give it 4-8 GB of RAM -> but with a more swap it could do.

Using your solution from on Jun 2, 2021 - will that always be possible in the future? /dev/xvdc is 12 GiB now but just executing it seems to work fine and it shows in top/htop.

@ddevz
Copy link
Author

ddevz commented Sep 26, 2024

On the chance that you might know how to do bash scripting: it's possible to put a script in cron in the qube that you are worried is going to run out of memory, set it to run every 5 min and notify you when its getting low by using something like:
notify-send --expire-time=360000 'RUNNING OUT OF MEMORY IN VM'

you can get your free memory in the qube with something like:
free -m | grep '^Mem' | awk '{print $7}'

@andrewdavidwong
Copy link
Member

@horizon2021: Please try resetting all initial and max memory settings to default (i.e., 400 MB / 4000 MB) and enabling "include in memory balancing" on all qubes where it's possible. (In other words, put all memory-related settings back to the defaults.) Then, reboot the whole system and trying using it for a while to see if the problem persists.

@andrewdavidwong andrewdavidwong added affects-4.2 This issue affects Qubes OS 4.2. and removed eol-4.0 Closed because Qubes 4.0 has reached end-of-life (EOL) labels Sep 27, 2024
@andrewdavidwong andrewdavidwong removed this from the Release 4.0 updates milestone Sep 27, 2024
@andrewdavidwong andrewdavidwong added the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Sep 27, 2024
@horizon2021
Copy link

On the chance that you might know how to do bash scripting:

Thanks. With the tutorials out there I probably could get this done; right now (whatever the security concerns are) just creating a new swap file seems easiest solution to me.

sudo swapoff -a

sudo fallocate -l 4G /swapfile

sudo chmod 0600 /swapfile

sudo mkswap /swapfile

sudo swapon /swapfile

@horizon2021
Copy link

@horizon2021: Please try resetting all initial and max memory settings to default (i.e., 400 MB / 4000 MB) and enabling "include in memory balancing" on all qubes where it's possible. (In other words, put all memory-related settings back to the defaults.) Then, reboot the whole system and trying using it for a while to see if the problem persists.

Will do a reinstall with the current version and get back to you. Please give it a while.

@andrewdavidwong
Copy link
Member

andrewdavidwong commented Sep 28, 2024

Will do a reinstall with the current version and get back to you. Please give it a while.

FWIW, I don't think a complete reinstall is necessary for the the settings I mentioned. In fact, if you back up all of your qubes first and restore them into the new installation, they may still have the old memory settings, so you may still have to change them manually anyway.

@horizon2021
Copy link

Alright, just a reboot. It is 11 active VM's in total, two HVM's 400 MB Ram (USB, Network), rest all set to default 400 - 4000. I recreated all VM's but sys-USB and the one that I use Telegram-desktop in. There are three disposable VM's with more or less intense (tabs,java) Firefox use.

All booted up fine but there was a noticeable delay in some disp. VM's in the beginning that was unexpected, switching tabs, writing something down in editor all took some delay (this is usually not happening with "fresh VM's" that have just been started to be used), it however did go away after a while and all seemed fine.

At one point I killed one disposable VM and started a new one, that one was really slow, due to RAM limitation. You can with this default memory setting, make memory available from elsewhere by closing application in another VM. Which helped here.

At last I was running into the situation with the swap filling up in one VM to it's limit again, however without total crash, only unresponsive for maybe 20-30 seconds, which was the VM that runs the Telegram app. It seemed to do ok until starting a video, this is the earliest copy+paste I could do after the problem occurred:

top - 23:14:28 up 22:40, 2 users, load average: 1.31, 2.05, 1.18
Tasks: 174 total, 3 running, 171 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.0 us, 3.4 sy, 0.0 ni, 62.9 id, 26.4 wa, 1.4 hi, 0.2 si, 1.8 st
MiB Mem : 684.7 total, 17.8 free, 658.7 used, 51.9 buff/cache
MiB Swap: 1024.0 total, 0.2 free, 1023.8 used. 26.0 avail Mem

memory usage due to telegram seems seems to vary a good bit, saw this application run fine below 1000 MB Ram with very little swap for a while, but it can also go up more, right now:

MiB Mem : 3230.9 total, 1387.6 free, 1419.4 used, 488.3 buff/cache
MiB Swap: 1024.0 total, 333.8 free, 690.2 used. 1811.5 avail Mem

Also sys-whonix seems to take more than I would expect:

MiB Mem : 1560.2 total, 269.9 free, 1092.7 used, 229.1 buff/cache
MiB Swap: 1024.0 total, 887.0 free, 137.0 used. 467.5 avail Mem

So the issue, as far as I initially reported, can be helped with just using the default (none became totally unresponsive). Memory limitation may still problematic since (the point has been made above) you do not immediately know when you are running into the system limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.2 This issue affects Qubes OS 4.2. C: other needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

8 participants