Description
•Huge risks for 2 disk raid 1 users that are upgrading their drives because of storage concerns, READ ALL TEXT.
• PROVEN IN A PRODUCTION ENVIRONMENT.
So i have been using BTRFS for multiple years now in a production environment and have stress tested the filesystem pretty well and I am aware of its capabilities at least most of them from my understanding.....
• WHY BTRFS IS AWSOME - Why it became my FS of choice
I have been so comfortable with it that i hot swapped and rebuilt larger multi drive raid 10 arrays often.... sometimes i yank multiple drives out at a time just to see what it can handle before going over the raid ratio. never a problem....
I was even able to recover a multi disk raid 10 failure after a shorted drive overloaded a perc controller on a R720XD and downed not 1 but most of the drives in an array by shorting the raid controller, motherboard and other stuff that's not too nice to see. In other words it became my filesystem of choice after it was able to recover from 1 of the worst failures possible that had nothing to do with BTRFS. It took swapping the super block, finding root fs, a few drive scans and eventually a recovery mount but i was amazed it was even possible...... BTRFS is now confirmed my FS of choice.
• NOW, THE PROBLEM OF DICTATORSHIP.
So now that you guys have got the good news, here's the bad news. Its an easy fix and it needs to be changed now. It will also have a big impact on the number of users, using BTRFS. So I never had a problem with BTRFS until the other day and it came with a 2 disk RAID 1 setup. So I was attempting to replace a single disk out of a 2 disk array which I found out was nearly not possible without multiple balances and or just yanking the drive follow by a rebuild. So why is BTRFS the best file system for developers? ITS ONLINE ABILITIES. It just works..... it does what your commands tell it to do... Until you give it a simple command of removing a disk from a minimal raid 1 setup (BYE BYE ONLINE ABILITIES).... You are presented with a message that you are not allowed to go below 2 disks in a raid 1 setup. So I thought about it for a second... does this make sense? NO IT DOES NOT. As we know a raid 1 is a simple mirror of its fellow relative... There is no reason why as soon as you issue a command to remove the disk it should argue with you about something that is easily achievable. This leads me to believe developers are attempting to add a fail safe for protecting the user against them self's. A silent rebalance of data could be done before the remove process with priority over the last remaining disk to contain both meta data and data. Followed by a removed BTRFS device. Because this is what BTRFS device remove and replace does. IT JUST WORKS and I have used it many many times. Everything I do with BTRFS just works. >>>---Unless there is a specific reason for it not to work --- <<< But in this case, there is no reason a user should be told they cant do something that is possible. What do most people do when replacing a drive in a small disk array? and why are they doing it? well they want to replace the disks because they need to upgrade or it failed? what if BTRFS is running on a 2 bay ONLY nas? The user is now in a position of looking at a forced reboot or a breaking of the array. If they do it before a balance they could be in big trouble?
• AUTOMATIC PAUSE BALANCE AT 1% REMAING DISK and 2 DISK DICTATORSHIP
So before telling the user they cant remove a disk for their own protection, most likely because they need to upgrade the space on the 2 disk array they have. Warn them that if converting to a single or dup based raid policy with no extra command lines will lead to READ ONLY FILE SYSTEM WHILE THE DISK IS FULL. Making these 2 changes immediately will have a large impact on BTRFS growth and its outlook. A large portion of end users are using 2 disks raid setups as most people using raid 1 are usually only on a 2 disk system anyway. The entire advantage of BTRFS is that its online abilities are something that ZFS cannot even match and makes it the best file system for hot swapping drives, changing profiles, rebalancing disks or transparent file compression.
• PUT ZFS IN THE GRAVE WITH LZ4 BASED RAM DISK CACHING AND ADD LZ4 TO THE COMPRESSION ALGORITHM FOR STANDARD FS USAGE.
The only reason I would choose ZFS over BTRFS is because of its arc caching abilities which from my understanding is like strapping a ram disk to the entire array to use as cache but without actually adding a ram disk. In specific workloads BTRFS lags BEHIND ZFS badly and I have been tempted to make the switch. The online capabilities and what I have seen BTRFS survive is what makes me stick with them.
• AUTOMATIC PAUSE BALANCE AT 1% REMAING DISK - This is more important that telling the user its too risky to remove the 1 of the last disk in a raid 1 and it is where most of the BTRFS negativity is coming in at. In fact I believe this issue will be presented to most users at least once if they are using BTRFS at an end user level in small raid setups. They will only be presented with nightmares that will leave them hating BTRFS. Restricting the user for a small risk and allowing a complete catastrophe to happen when they are forced to rebalance to a new raid type BECAUSE they are TOLD they CANT removed 1 of the last raid devices. IF THEY ARE DOING THIS TO UPGRADE SPACE THEY WILL HIT READ ONLY FILE SYSTEM WHILE FULL.
• THE END USER AND THE DANGER OF DUP, ROFS WHEN FULL
This is a nightmare scenario that I have witnessed first hand and it needs to be resolved. End users are using this software as well in high frequency on 2 disk only raid 1s. If they are upgrading a 2 disk raid 1 for space concerns they are already in a RED ALERT danger zone ESPECIALLY IF THEY CHOOSE DUP. Chances are they will than attempt to do things the safe way by balancing a new raid type to than remove the disk. If they have not been over all of the man pages and or BTRFS documentation they will than be presented with a locked system, full storage and read only.
Google "BTRFS balance read only fs" - Id suspect most users are making these post because they are being redirected from a simple denial of a raid 1 disk removal to an attempted balance with DUP and maybe even single. Users are warned about small issues before command execution and preventatives to ensure data integrity. Being redirected from a small denial as per safety prevention to a possible ROFS while full because they attempted to rebalance their 2 disk raid............. that was most likely running out of space already.... attempting to be able to remove a disk..... See where this is going? This is a very likely scenario and its happening all the time according to search results. Their needs to be a 1% space left balance pause before telling the user its not safe to remove 1 of the 2 disks in a raid 1 that is pointing them in an even WORSE possible direction. This is EXTREMELY contradictory. BTRFS is an amazing file system but I think this needs to be resolved.
• GO BTRFS!, I LOVE YOU GUYS. (RAID1C3 , RAID1C4 IS AWSOME)
Thank you for all your hard work on BTRFS, it is truly the filesystem of the future. I felt I had to strongly express my opinion
about the denial of command execution for 2 disk raid 1. Don't take it to heart, I love BTRFS and I couldn't do what you guys do. I got my personal server sitting on some Bcache ram disks till you guys finally add a custom ram backed caching. This is just honest insight. Don't hate me.... or come to my house when I am sleeping at night. My wife is dangerous.