My data storage and backup system | Dafacto
Dafacto

The personal website of Matt Henderson.

My data storage and backup system

17 January 2010

Inspired by Steven Frank’s description of his own system, I thought I’d take a moment to document how I store and backup our important data.

Backup Objectives.

  1. I want to get back up and running quickly if a computer’s startup drive dies.
  2. I want to be able to easily restore historical versions of important lost or modified files.
  3. I want to have important files stored ‘offsite,’ in case the house burned down.
  4. I want my backup system to be automated. If any part of it depends on me remembering to do something, then it will fail. (I’ve confirmed this, through personal experience, as a universal natural law.)
  5. I’d like some redundancy.

Software/Services.

My backup system relies on the following software and services:

  • SuperDuper will mirror one bootable drive to another. It can be scheduled to run automatically.
  • Time Machine. Time Machine is Apple’s archiving backup utility, which runs automatically.
  • ChronoSync is a general purpose backup tool for OS X, that can be scheduled to run automatically. Critical to my system, is its ability to create an archive of changed or deleted files.
  • Backblaze is an online service that, for $50 per year, will backup an entire computer, along with any disks that are attached to it. The service provides unlimited storage. You can restore files online, or have Backblaze FedEx you DVDs or USB drives.
  • Dropbox is an online service that creates a local folder on your Mac, the contents of which are sync’d to Dropbox’s servers (Amazon S3). When installed using the same username/password on two or more machines, Dropbox can be used to keep a shared folder in sync.

Context.

My home computer is a Mac Pro, which serves the following functions:

  • iTunes server (to other computers on our home network, and an Apple TV)
  • Master repository for our important files
  • My wife’s work computer

It has four internal drives installed:

  • Everest. The startup drive. Apart from applications and preferences, the only user data it contains is my wife’s account and work files.
  • EverestMirror. A bootable mirror of Everest.
  • Pumori. This is a 2TB Hitachi drive that contains the master repository of all our important files — iTunes music and videos, home photos and videos, non-current archives of personal and business documents, purchased software installers, etc.
  • Time Machine. This is a 1TB drive, used as a Time Machine target for Everest.

In addition, the Mac Pro has a Drobo attached, whose file system looks like this:

  • /Archives
  • /Backups
    • /Pumori
    • /Dropbox

The Setup.

  • To achieve quick recovery in case of startup drive failure, I have SuperDuper automatically mirror Everest to EverestMirror each night. A Growl notification lets me know this went OK each morning. (Objectives 1 & 4)
  • All of our personal and working current files are kept in Dropbox. This keeps them synchronized between my wife’s area of the Mac Pro, and my personal MacBook, and provides an online backup of them. (Objectives 2, 3, 4 & 5)
  • As a first line of recovering a lost or modified file from my wife’s working area, I have Time Machine archiving Everest to the “Time Machine” drive. (Objectives 2 & 4)
  • As a second line of recovering a lost or modified file from a portion of my wife’s working area (/Users/Wife/Dropbox), or from Pumori, I have Chronosync perform an archiving backup to the Drobo (in /Backups/Pumori and /Backups/Dropbox) each night. To avoid filling up the Drobo, I have Chronosync keep a maximum of five archived copies of any given file. I have Chronosync also configured to cleanup the archives by deleting files over 180 days old, but to always preserve at least one archived version of every file, regardless how old it is. Finally, I have Chronosync configured to email me if anything goes wrong during its nightly backup.(Objectives 2, 4 & 5)
  • For offsite backup, I have Backblaze backup Everest and Pumori. Given the large amount of data on Pumori, I decided to exclude iTunes “Movies” and “TV Shows”. They’re already backed up on the Drobo, and I figure in the very worst case, that the house burned down, I could just repurchase those as I wanted to view them again. Music, on the other hand, is something I wouldn’t want to have to repurchase, and so Backblaze does keep that backed up (approximately 45 GB.) (Objectives 3 & 4)

Final Notes.

  • I also use the Drobo to archive files that are large, uncritical and not backed up. For example, original DVD rips of movies (the VIDEO_TS folders).
  • For probably a decade, I've used File Buddy to search for files (in my archives), when needed. It's very configurable and fast.
  • Backblaze is great. Since first purchasing it, I’ve bought licenses for my own MacBook, our office server, and my mother’s computer. (More precisely, I've added those computers to my single Backblaze account, so that I can manage and restore from them all in a single web interface.)
  • Dropbox is also great. We’ve been using it extensively for a couple years now, and it has never failed us — which is quite a feat, considering the large number of files it manages, and the crazy folder reorganizations we've performed!
  • Both Backblaze and Dropbox also keep archived versions of changed or deleted files, but I prefer the convenience and configurability of my own solution. It’s nice to know those other versions are there, though.
  • As a curiosity, I suspect both Dropbox and Backblaze make use of “Content Addressable Storage,” which is a technology to allow unique representation of binary data. For example, I’ve dropped a huge Adobe Installer DMG file into Dropbox, and watched it almost instantly appear sync’d with Dropbox’s servers. This means that some other Dropbox customer already uploaded that particular file, and my own Dropbox recognized the “signature” of that particular file, and realized that it didn’t need to be uploaded again to their servers. Very clever. Along with multiplying the profits of Dropbox (since they pay a one-time storage fee to their provider for each N payments from their customers, for a given file), this also benefits users, as folder reorganization doesn’t require lengthy re-uploads of your files.

Update 2016-06-02

Since writing this article, I’ve made some changes:

  1. I now have my entire backup strategy based on CrashPlan, as documented in this article.
  2. For file synchronization and sharing, we now use BitTorrent Sync, a peer-to-peer system which is free and doesn't involve any third-party servers, as documented in this article.

Enjoy this article? — You can find similar content via the category and tag links below.

Questions or comments? — Feel free to email me using the contact form below, or reach out on Twitter.