Skip to content

Commit

Permalink
Add NFS crashed, XFS and deprecate ZFS mover.
Browse files Browse the repository at this point in the history
Work continues; work around bugs found via use of a different filesystem for cache.
  • Loading branch information
TheLinuxGuy committed Nov 23, 2022
1 parent 0ccccef commit d1b7ce1
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 3 deletions.
18 changes: 16 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ The following open-source projects seem to be able to help reach my goals. It re

- [SnapRAID](https://www.snapraid.it). Provides data parity, backups, checksumming of existing backups.
- [Claims to be better than UNRAID's](https://www.snapraid.it/compare) own parity system with the ability to 'fix silent errors' and 'verify file integrity' among others.
- [BRTFS Filesystem](https://btrfs.wiki.kernel.org/index.php/Main_Page) similar to ZFS in that it provides be the ability to 'send/receive' data streams (ala `zfs send`) with the added benefit that I can run individual `disk scrubs` to detect hardware issues that require me to restore from snapraid parity.
- [BRTFS Filesystem](https://btrfs.wiki.kernel.org/index.php/Main_Page) similar to ZFS in that it provides be the ability to 'send/receive' data streams (ala `zfs send`) with the added benefit that I can run individual `disk scrubs` to detect hardware issues that require me to restore from snapraid parity. **My observed Btrfs performance is that its poor compared to XFS filesystem on linux.** *Since we use btrfs only for the 'data' disks in the slow mergerfs pool we are not sensitive to speed.*
- **XFS Filesystem for NVME cache on mdadm array**. After finding bugs and instability in my ZFS+NFS+mergerfs implementation my cache disks are now formatted to XFS in RAID1. I did not use btrfs raid1 natively here because btrfs performance was poor (50% throughtput penalty). XFS was able to match ZFS raw speeds (without arc) ~900MB/s.
- [MergerFS](https://github.com/trapexit/mergerfs). FUSE filesystem that allows me to 'stitch together' multiple hard drives with different mountpoints and takes care of directing I/O operations based on a set of rules/criteria/policies.
- [snapraid-btrfs](https://github.com/automorphism88/snapraid-btrfs). Automation and helper script for BRTFS based snapraid configurations. Using BRTFS snapshots as the data source for running 'snapraid sync' allows me to continue using my system 24/7 without data corruption risks or downtime when I want to build my parity/snapraid backups.
- [snapraid-btrfs-runner](https://github.com/fmoledina/snapraid-btrfs-runner). Helper script that runs `snapraid-btrfs` sending its output to the console, a log file and via email.
Expand All @@ -38,14 +39,27 @@ The following open-source projects seem to be able to help reach my goals. It re
apt-get install zfsutils-linux cockpit-pcp btrfs-progs libbtrfsutil1 btrfs-compsize duc smartmontools
```

## ZFS cache pool setup
## ~~ZFS cache pool setup~~
**WARNING! DEPRECATED** NFS+ZFS is unstable with this setup. Follow XFS+mdadm below.

RAID1 of two SSD disks. We'll write all stuff here then purge to 'cold-storage' slower disks via cron.

```
zpool create -o ashift=12 cache mirror /dev/sdb /dev/nvme0n1
```

## XFS RAID1 mirror mdadm

See [mergerfs](mergerfs.md) for details on ZFS instability. For our cache pool we will use XFS filesystem. Set up the NVME cache as follows:

```
mdadm --create --verbose /dev/md0 --bitmap=none --level=mirror --raid-devices=2 /dev/nvme0n1 /dev/sdb
mkfs.xfs -f -L cache /dev/md0
mdadm --detail /dev/md0
```

Remember to add a mountpoint to start at boot.

## BTRFS (disk setup guide)

### BTRFS Commands TL;DR
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
#!/usr/bin/python3
# TheLinuxGuy ZFS cache pool mergerfs tiered cache mover.
# File age time-based mover depending on goal % cache utilization.
# TODO: move logs to standalone log file to stop adding crap to syslog
# This script works but is abandoned after NFS+ZFS+mergerfs instability.
# !! THIS SCRIPT IS ZFS POOL SPECIFIC !! DO NOT USE ON XFS cache setup.
import argparse
import subprocess
import syslog
Expand Down
54 changes: 54 additions & 0 deletions mergerfs.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# MergerFS

**WARNING: Using ZFS + NFS (non-zfs native export) + mergerfs cause [ZFS mount instability and crashes](https://github.com/trapexit/mergerfs/discussions/1098).**

MergerFS is used to "merge" all physical distint disk partitions (/mnt/disk*) into a single logical volume mount.

### Policies
Expand Down Expand Up @@ -44,6 +46,58 @@ To attempt to mirror what unraid provides with their share "cache" we are going

Recall that I chose to use ZFS and RAID1 mirror for this purpose to provide assurances that my data would not be lost before it gets moved onto parity-protected-snapraid-slow-storage-disks.

## NFS instability

`/mnt/cached` is my mergerfs pool and ZFS mountpoint on my local system. The `mergerfs` process seems to be crashing at some point due to NFS. I haven't yet found the root cause of this issue and have tried everything from upgrading kernel, ZFS, nfs-kernel-server, libfuse and OS (Ubuntu 20.04 to 20.10).

The crashes seem to be more pronounced when using NFSv4 protocols. NFSv3 is more stable but that is a stateless protocol and I would much prefer v4 only NFS shares. I have disabled v4 and force v3 for the time being to try to make my implementation stable.

Observed behavior (on local NAS):
```
# ls -lah /mnt/cached
ls: cannot access '/mnt/cached': Input/output error
```

Recovery steps


### Debugging with strace

```
root@nas:/home/gfm# strace -fvTtt -s 256 -p PIDHERE -o /tmp/mergerfs.strace.txt
strace: Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf: Operation not permitted
strace: attach: ptrace(PTRACE_SEIZE, 2081428): Operation not permitted
root@nas:/home/gfm# echo "0"|sudo tee /proc/sys/kernel/yama/ptrace_scope
0
``
If that doesn't work, change setting `/etc/sysctl.d/10-ptrace.conf` to 0. Reboot.
Strace isn't helpful according to mergerfs developer. Here's the proper way to debug mergerfs using gdb
### gdb debugging mergerfs
```
If it's crashing then strace is pretty useless. Need a stack trace from gdb.

gdb path/to/mergerfs

run -f -o options branches mountpoint

when it crashes

thread apply all bt
```
### Remove ZFS from the equation by using XFS RAID 1
```
mdadm --create --verbose /dev/md0 --bitmap=none --level=mirror --raid-devices=2 /dev/nvme0n1 /dev/sdb
mkfs.xfs -f -L cache /dev/md0
mdadm --detail /dev/md0
```
### NFS tweaks that were added
```
Expand Down

0 comments on commit d1b7ce1

Please sign in to comment.