Description
LVM volumes are not always mounted after reboot after applying systemd-239-78.0.3
and above.
I constructed several test cases to demonstrate the issue using an Oracle provided AMI ami-076b18946a12c27d6
on AWS.
Here is a sample Cloud Formation template that is used to demonstrate the issue:
non-working-standard.yml.txt
User data:
yum install -y lvm2
yum update -y systemd
systemctl disable multipathd
nvme=$(lsblk -o NAME,SIZE | awk '/ 1G/ {print $1}')
pvcreate /dev/$nvme
vgcreate testvg /dev/$nvme
lvcreate -l 100%FREE -n u01 testvg
mkfs.xfs -f /dev/testvg/u01
echo '/dev/testvg/u01 /u01 xfs defaults 0 0' >> /etc/fstab
mkdir -p /u01
mount /u01
Once the template is deployed, confirm that cloud-init completed without errors and /u01
is mounted. Then reboot the EC2 instance, e.g. via reboot
.
When it comes back, /u01
is not mounted anymore:
[ec2-user@ip-10-100-101-225 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 799M 0 799M 0% /dev
tmpfs 818M 0 818M 0% /dev/shm
tmpfs 818M 17M 802M 2% /run
tmpfs 818M 0 818M 0% /sys/fs/cgroup
/dev/nvme0n1p1 32G 2.4G 30G 8% /
tmpfs 164M 0 164M 0% /run/user/1000
/var/log/messages
contains:
systemd[1]: dev-testvg-u01.device: Job dev-testvg-u01.device/start timed out.
systemd[1]: Timed out waiting for device dev-testvg-u01.device.
systemd[1]: Dependency failed for /u01.
systemd[1]: Dependency failed for Remote File Systems.
systemd[1]: remote-fs.target: Job remote-fs.target/start failed with result 'dependency'.
systemd[1]: u01.mount: Job u01.mount/start failed with result 'dependency'.
systemd[1]: dev-testvg-u01.device: Job dev-testvg-u01.device/start failed with result 'timeout'.
I created several Cloud Formation templates: test-cases.zip
non-working-standard
: the deployment when systemd is updated to the currently available latest version239-78.0.4
and multipathd is disabled./u01
is not mounted on rebootnon-working-systemd
: the deployment to demonstrate that/u01
is not mounted on reboot if systemd is updated to239-78.0.3
- the version that introduced this problemworking-fstab-generator-reload-targets-disabled
: the deployment wheresystemd-fstab-generator-reload-targets.service
is disabled. It is the service that Oracle introduced insystemd-239-78.0.3
. There is no such a service in the upstream./u01
is mounted after reboot.working-multipathd-enabled
: the deployment wheremultipathd.service
is enabled./u01
is mounted after rebootworking-systemd
: the deployment that usessystemd-239-78.0.1
- the one that is shipped with the AMI and it does not have the issue./u01
is mounted on reboot
For each of the deployments above, I ran the following commands:
after deployment
date
sudo cloud-init status
df -h
rpm -q systemd
systemctl status multipathd
systemctl status systemd-fstab-generator-reload-targets
sudo reboot
after reboot
date
uptime
df -h
journalctl -b -o short-precise > /tmp/journalctl.txt
sudo cp /var/log/messages /tmp/messages.txt
sudo chmod o+r /tmp/messages.txt
The logs of the command executions are in the commands.txt
files inside the archive along with journalctl.txt
and messages.txt
.
Thus, the issue happens when all of the following conditions are true:
systemd >= 239-78.0.3
multipathd
disabled- there is a mount on top of LVM
The following workarounds are known to prevent the issue, so that an LVM volume /u01
is mounted after reboot:
- use
systemd < 239-78.0.3
- enable
multipathd
- disable
systemd-fstab-generator-reload-targets
I have been able to reproduce this issue only on AWS with different instance types (AMD/Intel based). I was not able to reproduce the issue on Azure with both NVMe and non-NVMe based VM sizes.
What is really happening here is that lvm2-pvscan@.service
is not invoked sometimes after applying systemd-239-78.0.3
. Therefore, LVM auto-activation is not performed. If I reboot the EC2 instance and find that an LVM volume is not mounted, I can manually activate problem volume groups via vgchange -a y
, or I can also run: sudo /usr/sbin/lvm pvscan --cache --activate ay 259:1
for a problem device as it is demonstrated below (the command used by lvm2-pvscan@.service
):
[ec2-user@ip-10-100-101-125 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 799M 0 799M 0% /dev
tmpfs 818M 0 818M 0% /dev/shm
tmpfs 818M 17M 802M 2% /run
tmpfs 818M 0 818M 0% /sys/fs/cgroup
/dev/nvme0n1p1 32G 2.4G 30G 8% /
tmpfs 164M 0 164M 0% /run/user/1000
[ec2-user@ip-10-100-101-125 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 32G 0 disk
└─nvme0n1p1 259:2 0 32G 0 part /
nvme1n1 259:1 0 1G 0 disk
[ec2-user@ip-10-100-101-125 ~]$ sudo /usr/sbin/lvm pvscan --cache --activate ay 259:1
pvscan[905] PV /dev/nvme1n1 online, VG testvg is complete.
pvscan[905] VG testvg run autoactivation.
1 logical volume(s) in volume group "testvg" now active
[ec2-user@ip-10-100-101-125 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 799M 0 799M 0% /dev
tmpfs 818M 0 818M 0% /dev/shm
tmpfs 818M 17M 802M 3% /run
tmpfs 818M 0 818M 0% /sys/fs/cgroup
/dev/nvme0n1p1 32G 2.4G 30G 8% /
tmpfs 164M 0 164M 0% /run/user/1000
/dev/mapper/testvg-u01 1016M 40M 977M 4% /u01