Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting up and shutting down VMs recently became extremely slow #4927

Closed
andrewdavidwong opened this issue Mar 29, 2019 · 12 comments
Closed
Labels
C: core P: major Priority: major. Between "default" and "critical" in severity. r4.0-dom0-stable r4.1-dom0-cur-test T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@andrewdavidwong
Copy link
Member

andrewdavidwong commented Mar 29, 2019

Qubes OS version
R4.0

Affected component(s) or functionality
Unknown

Brief summary
Starting up and shutting down VMs used to take only a few seconds. Now it takes over 30 seconds.

To Reproduce
This happens every time I start up or shut down a VM. Attempting to shut down many running VMs (qvm-shutdown --wait --all) can take 10 minutes, whereas before it used to take less than a minute.

While the VMs are running, all operations inside of them are very fast and responsive. Dom0 is also very fast and responsive. Only starting up and shutting down VMs seems to be affected.

Expected behavior
VM startup and shutdown in <5s (which was the case until recently).

Actual behavior
VM startup and shudown in >30s.

Additional context
This started pretty recently. I suspect that it was caused by a recent update. Here are the packages that have recently been updated in dom0 on my system (the version given is the version I currently have installed):

2019-03-16:

python2-qubesadmin-4.0.25
python3-qubesadmin-4.0.25
qubes-core-admin-client-4.0.25

2019-03-19:

qubes-core-dom0-4.0.41
qubes-core-dom0-linux-4.0.18
qubes-core-dom0-linux-kernel-install-4.0.18
qubes-gpg-split-dom0-2.0.35
qubes-input-proxy-1.0.14
qubes-mgmt-salt-base-topd-4.0.1
qubes-usb-proxy-dom0-1.0.20
garcon-1000:0.5.0 (reinstalled)
xfwm4-1000:4.12.4 (reinstalled)

Solutions you've tried
I have tried:

  • Enabling TRIM at all levels according to the docs.
  • Increasing minimal qube memory to 400 MiB then restarting qmemman.
  • Increasing minimal qube memory to 2000 MiB then restarting qmemman.
  • Increasing dom0 memory to 500 MiB.
  • Increasing the VM's initial memory to 2000 MiB.
  • Switching the default domU kernel from 4.14.103-1 back to 4.14.74-1.
  • Switching the dom0 kernel from 4.14.103-1 back to 4.14.74-1.

None of these had any noticeable effect on startup or shutdown time (still slow).

Relevant documentation you've consulted
N/A

Related, non-duplicate issues
I originally thought this was #2963. Please see the collapsed comments there for previous discussion.

@andrewdavidwong andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: core P: major Priority: major. Between "default" and "critical" in severity. labels Mar 29, 2019
@andrewdavidwong andrewdavidwong added this to the Release 4.0 updates milestone Mar 29, 2019
@andrewdavidwong
Copy link
Member Author

@marmarek, I'm not sure if setting "minimal qube memory" in the Global Settings GUI is actually working correctly. Is there any way to verify this on the command line? qubes-prefs doesn't seem to show it.

@marmarek
Copy link
Member

It's set in /etc/qubes/qmemman.conf.

@marmarek
Copy link
Member

How many files do you have in /etc/lvm/archive? If more than 1-2k, try removing them and see if that helps.

@t4777sd
Copy link

t4777sd commented Mar 29, 2019

Can the /etc/lvm/archive directory just be deleted? I am not really experiencing this issue, but I have over 5000 items in /etc/lvm/archive.

I saw in /etc/lvm/lvm.conf there is two parameters "retain_min" and "retain_days". That is set to 10 and 30.

Is there an lvm command that we can use that will automatically trim /etc/lvm/archive according to to that setting in lvm.conf?

@marmarek
Copy link
Member

I think it is done according to those limits already. The thing is, it's totally possible that lvm will produce 5k files in 30 days.

@andrewdavidwong
Copy link
Member Author

andrewdavidwong commented Mar 30, 2019

How many files do you have in /etc/lvm/archive? If more than 1-2k, try removing them and see if that helps.

That was it! I had well over 120k files in there. Removing all
files older than a day (which still leaves 3k files) resulted in an
immediate and drastic performance improvement. Individual VMs now start
up in ~9s and shut down in ~7s, and shutting down several running VMs
finishes in ~30s.

It looks like there's no way to set a maximum retention time in
lvm.conf, only a minimum retention time (retain_days is also a minimum):

https://serverfault.com/questions/653261/lvm-archive-and-backup-files-not-purging

So, I've created a daily cron job to delete files older than one day,
using this command:

find /etc/lvm/archive/ -type f -mtime +1 -name '*.vg' -delete

Thanks, @marmarek!

I'll leave this open in case you want to take action on it, if it might
affect other users in the future, once their archives build up, or if it
seems like there are any further performance improvements to be had.

@tasket
Copy link

tasket commented Mar 30, 2019

Long ago I wrote a script to take care of this... forgot it was even an issue:

#!/bin/sh
# Prune old log files

[ $( id -u ) = 0 ] || exit 1

days=4

journalctl --vacuum-time=${days}days

for dir in /var/log; do
  find $dir -mindepth 1 -mtime +$days -delete
done

find /etc/lvm/archive -mtime +$days -delete

rm /var/log/qubes/qmemman.log
touch /var/log/qubes/qmemman.log

find /var/lib/qubes/backup -mindepth 1 -mtime +11 -delete

@marmarek
Copy link
Member

marmarek commented Apr 2, 2019

I think this may be the same as #2963

@andrewdavidwong
Copy link
Member Author

I think this may be the same as #2963

I originally thought so too (as you can see from our collapsed comments on there), but then I thought it must be different, because #2963 was opened so long ago without any corroborating reports or other activity, whereas my slowdown ramped up rather quickly and was so severe that anyone else experiencing the same thing would surely have reported it. Anyway, I don't know whether they're the same. #2963 doesn't really contain enough information for me to tell.

@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-linux-4.1.0-1.fc29 has been pushed to the r4.1 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

marmarek added a commit to QubesOS/qubes-core-admin-linux that referenced this issue Sep 10, 2019
Those files may easily accumulate in large quantities, to the point
where just listing the /etc/lvm/archive directory takes a long time.
This affects every lvm command call, so every VM start/stop.
Those archive files are rarely useful, as Qubes do multiple LVM
operations at each VM startup, so older data is really out of date very
quickly.

Automatically remove files in /etc/lvm/archive older than one day.

Fixes QubesOS/qubes-issues#4927
Fixes QubesOS/qubes-issues#2963

(cherry picked from commit 2ec29a4)
@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-linux-4.0.19-1.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-linux-4.0.19-1.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core P: major Priority: major. Between "default" and "critical" in severity. r4.0-dom0-stable r4.1-dom0-cur-test T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

5 participants