Startup disk recovery and repair — lessons learned.

Yesterday, the SSD startup drive in my OS X MacBook became extensively corrupted, such that the computer would no longer boot from it. The process of recovering and repairing the drive revealed a number of important lessons related to recovery preparedness.

Startup drive corruption.

Last night, as iTunes was downloading the iPhone iOS 4 update on my OS X MacBook, the disk utility Drive Genius (which continually monitors the health of my drives) began raising alerts related to my startup solid-state device drive. Paraphrasing, the error messages said:

Drive Genius has discovered HFS errors on your startup drive. The unix utility fsck will be run on the next mount of this drive.

I immediately restarted the computer, in order to allow fsck to run, and was comforted to see its progress bar appear on the startup screen. But then, unfortunately, after proceeding just a bit, the progress bar would reset itself, and try again. After three failed repair attempts, fsck gave up, and the computer simply shut down.

Booting from my mirror backup.

My backup procedure involves, in addition to a Time Machine, the nightly maintenance of a bootable external mirror drive, from which I can startup and run disk utilities precisely in situations like this.

As Murphy would have it, my mirror drive had been acting flaky over the past few days, and was actually two days out of sync. Although that in itself would be a bit of a problem should I have to recover data from that drive, my bigger worry was being unable to boot from it. (Should that happen, my lone remaining option would be to boot from a Snow Leopard disk, and rebuild my startup drive from the Time Machine — a very lengthy process.)

Fortunately, I was able to boot from the mirror drive — and I made an immediate note to replace that it ASAP.)

Subsequently, I spent probably 20 minutes acknowledging connection requests from Little Snitch, allowing the multitude of startup applications, menu items, and system preferences to phone home for various reasons (checking for updates, etc.) I also was reminded how excruciatingly slow a USB drive is, compared to an SSD. I found myself really wishing I’d created a special account for such purposes, with no startup apps.

I then noticed Dropbox and Backblaze beginning to do some heavy processing — uh-oh. Given my backup drive was two days out of sync, these cloud-syncing utilities became confused at what they were seeing, as they compared my local filesystem to the sync’d versions in the cloud. This would eventually cause some small problems for me, as described later.

Repairing the startup drive.

The startup drive had invalid node problems that were so severe, that neither Apple’s Disk Utility, nor Drive Genius could repair them. In fact, Disk Utility went so far as to suggest immediate reformatting of the drive.

Fortunately, I also own a copy of Disk Warrior 4, and decided to give it a go. Its progress bar quickly progressed for a while, but then sadly appeared to stop — permanently frozen in position, for at least five minutes. I was actually about to quit the application, but then a Google search revealed that in case of severe disk problems, Disk Warrior can take a long time (up to 12 hours in some cases) to repair the drive. I also found several reports from people who were just about to give up at the frozen progress bar (like me), and then decided to continue waiting, and were rewarded for their patience.

So, I allowed Disk Warrior to carry on, and went off to watch a movie. A few hours later, I checked back, and sure enough, Disk Warrior had fixed the problems, and was waiting for me to confirm its rebuild of the startup drive’s directory.

Up and running again.

Once repaired, I rebooted from the SSD, quickly refreshed my mirror backup, created a second backup on another drive, and let Time Machine run. Confident my data was all safely backed up, I went to bed.

Outdated data problems.

This morning, as I began working with the computer, I started discovering various data problems:

  1. Things.app data was out of date.
  2. Yojimbo’s library was out of date.

These applications store their data in Dropbox, which synchronizes to the cloud. Obviously, when I booted from my mirror drive yesterday — which was two days out of date — Dropbox got confused and replaced current files with out of date files (in the cloud).

I was able to recover from this situation, though, because Dropbox was smart enough to keep copies of the “conflicted” data. For example, just next to Things’s “Database.xml” file was “Database (Conflicted Copy from 2010-06-21).xml”. Switching to that file, and I was back on track in Things.app. (Same with Yojimbo.)

To identify similarly affected apps, I used File Buddy to search for all files with “conflicted” in their name. Turns out, though, only Things and Yojimbo had been affected.

Lessons learned.

I learned a couple of important lessons from this experience:

  1. Never procrastinate replacing a backup drive that’s beginning to fail. Remember that Murphy is watching us at all times. This time, I got lucky.
  2. In anticipation of having to boot from your mirror drive, it’s a good idea to maintain an account on your computer into which you’ll login during recovery activity. Ideally, this account will have a minimum number of startup applications, and will have a minimum number of cloud-syncing applications (like Dropbox) running. (Pro tip: If you use Yojimbo to store your serial numbers, you’ll want to keep a fresh copy of its data in ~/Library/Application Support/Yojimbo in that account as well, so that you’ll have access to your app’s registration numbers.)
  3. As commonly reported, Disk Warrior seems to be the disk repair utility to have around. And it’s important to remember that Disk Warrior can take a long time to fix a bad drive. I’ve heard that it’s very good about telling you when it decides to give up — so, as long as it’s still running, have faith.
  4. When your startup drive dies, it’s really nice to be able to boot and recover quickly from a mirror drive. Given how cheap disk space is, it’s probably worthwhile to maintain two bootable mirror drives, just in case one proves faulty at just the worst moment.

 

19 thoughts on “Startup disk recovery and repair — lessons learned.”

  1. Great to see you recovered! Some remarks:

    The beauty of Dropbox is, that it is also a versioning system. Even if you wouldn’t have the “conflicted copy”, you could revert back to a previous version of any file using the Dropbox web interface.

    To keep an eye on Dropbox, i have created a simple saved spotlight search which is always accessible in the sidebar of my finder windows. Simply search for “Filename contains conflicted copy” on “This Mac” and save the search to the sidebar. Very handy!

    Finally here’s some fsck voodoo i had to learn the hard way during a failed Tiger install back in 2005. Might still help some today i guess: http://bit.ly/9AZYh6

  2. Can you not boot from OS X install CD and run DiskWarrior from there? There by avoiding the data issues? I believe you can start a terminal from the Install CD.

    1. DiskWarrior comes with a bootable CD, but I couldn’t find it when I needed it. I’m not sure what OS version that CD runs, and whether that would introduce any issues repairing a drive with a later OS version (I’m guessing it wouldn’t.)

  3. Spurred by your post I checked my disk with Disk Utility from the install CD. A good way to avoid problems. I tried to launch Finder from the terminal and it would not let me. The terminal on the install CD is rather limited.

    The underlying disk format (HFS+) has not changed in a long time so an old OS version should be no problem. I have actually repaired minor problems with Disk Utility from an old OS a number of times without problem.

  4. Hi Simon, in my case, Disk Utility was unable to repair the drive, and so it would require booting from the DiskWarrior CD. I have since found it, though, collecting dust in a closet. 🙂

    1. I know of at least three utilities for creating bootable disks: (1) Carbon Copy Cloner, (2) Chronosync and (3) SuperDuper. In my experience, SuperDuper is, by far, the most reliable of the lot — and so that’s the one I use. It runs automatically each night, updating my bootable mirror drive.

  5. In the past, I’ve made a bootable volume w/ Apple Software Restore. Unfortunately, you have to boot from another volume (e.g. an install DVD) in order to do this, so it’s not really feasible for daily use.

    1. @Michael, I think I still prefer having a dedicated account, since there are some items that I actually would like started up (like LaunchBar, iStat Menus, etc.) Also, having a dedicated account allows me to have the Dock preconfigured to contain all the utilities I might need quick access to (Terminal, Activity Monitor, Disk Utility, etc.)

  6. Matt, it is ockay that you described how you resolved the problem, but were you able to find out what caused it? I am asking since one of my macs, a 1st gen MacBook Air with SSD just corrupts the disk every time it is shut down after something is updated via system updates. Then I have to boot from a USB drive, and rebuild with Disk Warrior, but sometimes so many files are broken and being put to recovered files folder that I have to completely reinstall the system and restore from TimeMachine backup…

    I will try to use the Drive Genious, but just curious if you could find out what happened (or is it happened again maybe)?

    Sincerely, Balázs

    1. Balázs, I don’t know why it happened, but it consistently happened when intense disk activity was ongoing — for example, performing a SuperDuper copy while making a network transfer of many files. Hope this helps. Matt

  7. Thanks for this informative article. I’m about to embark on a recovery process myself so this has been helpful.

    One question, did you find any anomalies with Backblaze during your recovery? Did it just overwrite your backup with the 2-day-old data with no chance for recovery from their existing backup? If so, I’m assuming it synchronized back to normal after you brought back the more recent files from Dropbox?

    One issue I’m worried about with Backblaze is the potential need for it to “re-backup” the entire drive (just under 1TB) if the old one is going to get rebuilt with Disk Warrior (or flat out reformatting if the need arises). I have the added complexity that the drive I’m recovering is on an external Drobo…which I guess can also be considered a good thing since my system is remaining untouched.

    Appreciate any additional insight you may have.

    -Ken

  8. Hi Ken,

    Both Dropbox and Backblaze managed to eventually get caught-up — so no problems there.

    Both Backblaze and Dropbox use something called Content Addressable Storage (CAS). When a file is about to be backed up (uploaded), both utilities will compute a hash (a digital signature), and then check if that hash already exists in their storage. If it does, then they will NOT upload the data, but simply make a pointer to that existing data. So, even though Backblaze may (in your example) need to “re-backup” everything, it actually won’t physically upload the data again, so the “backup” process will happen very rapidly (i.e. the time it needs to compute all the hashes, and confirm that they already exist in their storage.)

    A difference between Backblaze and Dropbox in this regard is that Backblaze encrypts your data to a personal key, meaning that the scope of hash checking is your own data. Dropbox on the other hand compares hashes globally. That’s why, for example, when I put the Adobe CS4 installer in Dropbox, it instantly appear uploaded, since some other person out there in the world had already uploaded it.

  9. Ken,

    I forgot to mention, since writing this article, I’ve started to use a MacOS X application called Arq to backup my data to my own personal storage area at Amazon S3. So far, it has worked great. I like the thought of having full control over where my data is stored. Of course, while not terribly expensive (about $30/month for the amount of data i have), Amazon S3 is more expensive than Backblaze’s annual $50.

  10. Matt, thanks for the reply. I had no idea that Dropbox actually stores the same file only once no matter how many people upload it. That’s pretty efficient and cool technologically but makes me wonder about the security of the files in terms of access. It seems they take security pretty seriously so I’m not overly concerned, but it just makes me wonder.

    Right now, I’m happy with the “simplicity” and relatively low cost of Backblaze but I’ve wanted to try S3 for awhile now so thanks for the tip on Arq. I’ll have to check that out.

    My rebuild seems to have worked flawlessly, and so far, Backblaze doesn’t appear to be trying to re-backup everything. I’ve noticed it sometimes takes a couple days before it “sees” new files though so I’m just keeping my fingers crossed. Based on your input, it sounds like it will be faster the second time around anyway if it tries again.

    Thanks again for the input.

  11. That Dropbox is using CAS doesn’t cause me any concern about security, since the file wouldn’t be deleted as long as any user account has a pointer to it. What is really in their interest is that they are charging 1000s of users for storage, for which they only pay once. Nice business model. 🙂

Agree? Disagree? What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.