I’m frustrated because this feature was mentioned by Schwartz when it was still in beta. I thought a new era of home computing was about to start. It didn’t, and instead we got The Cloud, which feels like decentralization but is in fact massive centralization (organizational, rather than geographical).
Some of us think people should be hosting stuff from home, accessible from their mobile devices. But the first and to me one of the biggest hurdles is managing storage. And that requires a storage appliance that is simpler than using a laptop, not requiring the skills of an IT professional.
Drobo tried to make a storage appliance, but once you got to the fine print it had the same set of problems that ZFS still does.
All professional storage solutions are built on an assumption of symmetry of hardware. I have n identical (except not the same batch?) drives which I will smear files out across.
Consumers will never have drive symmetry. That’s a huge expenditure that few can justify, or much afford. My Synology didn’t like most of my old drives so by the time I had a working array I’d spent practically a laptop on it. For a weirdly shaped computer I couldn’t actually use directly. I’m a developer, I can afford it. None of my friends can. Mom definitely can’t.
A consumer solution needs to assume drive asymmetry. That day it is first plugged in, it will contain a couple new drives, and every hard drive the consumer can scrounge up from junk drawers - save two: their current backup drive and an extra copy. Once the array (with one open slot) is built and verified, then one of the backups can go into the array for additional space and speed.
From then on, the owner will likely buy one or two new drives every year, at whatever price point they’re willing to pay, and swap out the smallest or slowest drive in the array. Meaning the array will always contain 2-3 different generation of hard drives. Never the same speed and never the same capacity. And they expect that if a rebuild fails, some of their data will still be retrievable. Without a professional data recovery company.
which rules out all RAID levels except 0, which is nuts. An algorithm that can handle this scenario is consistent hashing. Weighted consistent hashing can handle disparate resources, by assigning more buckets to faster or larger machines. And it can grow and shrink (in a drive array, the two are sequential or simultaneous).
Small and old businesses begin to resemble consumer purchasing patterns. They can’t afford a shiny new array all at once. It’s scrounging and piecemeal. So this isn’t strictly about chasing consumers.
I thought ZFS was on a similar path, but the delays in sprouting these features make me wonder.
I can afford it, but have a hard time justifying the costs, not to mention scrapped (working) hardware and inconvenience (of swapping to a whole new array).
I started using snapraid [1] several years ago, after finding zfs couldn't expand. Often when I went to add space the "sweet spot" disk size (best $/TB) was 2-3x the size of the previous biggest disk I ran. This was very economical compared to replacing the whole array every couple years.
It works by having "data" and "parity" drives. Data drives are totally normal filesystems, and joined with unionfs. In fact you can mount them independently and access whatever files are on it. Parity drives are just a big file that snapraid updates nightly.
The big downside is it's not realtime redundant: you can lose a day's worth of data from a (data) drive failure. For my use case this is acceptable.
A huge upside is rebuilds are fairly painless. Rebuilding a parity drive has zero downtime, just degraded performance. Rebuilding a data drive leaves it offline, but the rest work fine (I think the individual files are actually accessible as they're restored though). In the worst case you can mount each data drive independently on any system and recover its contents.
I've been running the "same" array for a decade, but at this point every disk has been swapped out at least once (for a larger one), and it's been in at least two different host systems.
I gave snapraid a serious look a few months back and decided it might not be for me because the act of balancing writes out to the "array" member disks appeared to be manual.
I didn't want to point applications at 100 T of "free" space only for attires to start blocking after 8.
Am I mistaken about that?
I use "existing path, least free space". Once a path is created, it keeps using it for new files in that path. If it runs out of space, it creates that same path on another drive. If the path exists on both drives for some reason, my rationale is this keeps most of the related files (same path) together on the same drive.
I see there's some newer "most shared path" options I don't remember that might even make more sense for me, so maybe that's something I'll change next time I need to touch it.
[1] https://github.com/trapexit/mergerfs
[2] https://github.com/trapexit/mergerfs#policy-descriptions