This week, I lost over one potential work day to HFS+. And it wasn’t the first time I’ve lost time to HFS+. I want to make arrangements to avoid losing time to HFS+ in the future.
WindowServer becomes unusable. (In the past, I’ve had similar problems caused by the Firefox version of CoolIris, but this time there was no way to tell what the culprit was.)
I don’t have sshd running. (Clearly, a big mistake.)
I eventually do Cmd-Ctrl-Power, which doesn’t turn electricity off immediately but triggers a slow shutdown. The other magic keystrokes don’t work.
The computer no longer boots. (In reasonable time; no feedback on whether it was running fsck or had hung.) Apparently the slow shutdown didn’t do the right thing with disks.
I boot from DVD. Disk Utility cannot repair the boot volume.
Disk Utility says the Time Machine volume (on a dedicated disk) is broken, too! (Isn’t it great how the kind of situation that breaks the boot volume is almost guaranteed to break the Time Machine volume as well?)
It becomes obvious that running fsck on the Time Machine volume will take the whole day, and restoring the whole system from a broken Time Machine volume is not a good idea.
I got fetch my off-site backup from a bank vault.
I wait.
I restore the system from the off-site backup and merge changes manually from the Time Machine volume that eventually was repaired to a mountable state.
I want to avoid this kind of thing. Clearly, Time Machine is not the solution. It prevented data loss this time, but it sure wasted a lot of time.
Problem | Solution |
---|---|
The house burns down | Off-site backup |
The equipment is stolen | Encrypted disk, off-site backup |
A hard drive wears out | RAID |
The OS kernel shuts down improperly | NAS-managed file system |
File system gets corrupted spontaneously | A better file system, preferably managed by a kernel dedicated to that task |
User deletes a file, regrets | Frequent file system snapshots |
Application (iCal) corrupts its data repository | Frequent file system snapshots |
Power spike | UPS |
Power goes out unexpectedly | UPS that can signal to the NAS the need to shut down |
Need to travel | Copy of travel-relevant data on laptop |
Laptop is stolen | Encrypted home directory, backup of travel-relevant data in the cloud |
So here’s what I want based on the above:
I want a NAS that exposes its data using NFS (v3 for Leopard and Intrepid) and contains a RAID for disk redundancy. Bonus points if the RAID can grow in-place when the disks are replaced with bigger ones one-by-one. The internal file system should be ZFS for automated in-filesystem snapshotting (no userland backupd on the client, please!). Snapshots should be exposed NFSv4. I also want future versions of Mac OS X and Ubuntu to support an exploration UI for ZFS snapshots exposed over NFSv4. Bonus points if the snapshots are also exposed as dated share points for NFSv3 or CIFS for legacy clients.
There should be an option for encrypting the whole thing with a key that is loaded from a USB stick into the RAM of the NAS at boot time so that the disk can’t be read if the NAS shuts down and is rebooted without the USB stick. (I’m a bit torn on whether I really want this. It’s more likely that the file system needs repair that encryption would prevent than that the NAS is stolen or confiscated.)
The NAS should be able to make a backup to an USB drive or to another NAS from time to time. It should also be able to do Obnam-style incremental locally GPG-encrypted backups to a remote sftp server. Further, restore should work in such a way that you can initialize the restore from the USB drive / other NAS and then get only the delta from the sftp server.
The NAS should integrate with an UPS so that when the UPS notices that external power drops, it can tell the NAS to shut down properly.
It should be easy to toggle a user account between office mode (home directory on NFS) and travel mode (thin home directory on an encrypted local sparse disk image).
The NAS should be a low-maintenance appliance in a small and quiet box taking little electricity.
The appliance should cost less than €1000. (I think this is realistic. The various non-client pieces I mentioned above already exist as unintegrated Open Source programs and suitable hardware less the UPS with not quite as great software is already available for under €500.)
A Netgear ReadyNAS is an NFSv3 RAIDed UPS-compatible NAS appliance. However, the ReadyNAS products are Ext3-based—not ZFS-based. I gather that the models that support snapshots are limited to one snapshot at a time so that you can take a backup from a consistent snapshot. Encrypting the data with a boot-time key from an USB stick doesn’t appear on the feature list. While rsync-based backups are supported, Obnam-style encryption is not.
(Thanks to Mark Pilgrim for making me aware of ReadyNAS via his blog.)
A DroboShare-based solution would be more expensive and less featureful than a ReadyNAS, it seems.
Hoping to find something ZFS-based, I went to see the offerings Sun Microsystems has. Unfortunately, Sun doesn’t have products for home offices. Their offerings are rack-mounted and cost a lot.
On the client side, browsing ZFS snapshots isn’t available, AFAIK, and toggling e.g. a Mac user account between NFS and FileVault isn’t streamlined (AFAIK).
Anyway, given what’s available, I ordered a ReadyNAS Duo and intend to give up on Time Machine. Time Machine seemed cool for about a year.
I find it insulting that a 1 TB Time Capsule (“server grade” disk) and a ReadyNAS Duo with two 1 TB Seagate “desktop grade” disks are in the same price range (when bought from Germany) but the Apple offering doesn’t have disk redundancy and doesn’t export an NFS share but a sparse disk image leaving the filesystem management (and HFS+ corruption opportunities!) to the client kernel. Boo! That’s uncool, Apple!