Revising the Backup Strategery

Yes, I used the non-word "Stratergery." Ever since George the 43rd mis-introduced that word to the world, I can't stop using it. Makes me laugh every time. But that's not what today's post is about. It's about backups and how to avoid a long night at the office.

Most of you know I'm now the defacto IT guy at Upper Room. And you probably know we built a MacPro running Leopard Server (if you missed it, you can read about it here, here and here). We're running an 8 drive, 8 TB RAID 6 for shared storage and Time Machine backup. I felt pretty safe using that arrangement. But I also wanted to protect the server boot volumes--mainly because if the server goes down, we can't get to the shared storage. While building it was fun, re-building it would not be. So this was my solution to keep the boot drives backed up.

My Server Boot (System) Drives

As you can see, I created the main boot, or system, volume as a RAID 1. That means all the data that's written to that single logical volume (it shows up as one drive on the desktop) is actually written to two 250 GB drives. Should one fail, I can immediately fall over to the other one with no data loss and nearly no downtime. I named that volume Tobias.

Lindsey is what I call the hot backup (Arrested Development fans are laughing right about now...). Lindsey is an identical 250 GB drive, but is partitioned into a 200 GB volume and a 30 or so GB volume (named Kitty) for log storage. I did that so a runaway process wouldn't fill my system volume up with logs and break something. Lindsey is cloned every night from Tobias (originally using SuperDuper, now Carbon Copy Cloner).

Michael is our off-site backup. I have that mounted in a drive carrier that fits into an eSATA enclosure. I clone Tobias to Michael on Tuesdays and Thursdays. At least, that was the plan. That all changed last week.

Last week, I was mucking around trying to get Software Update Server up and running. I was having some problems, which I diagnosed as a permissions/sharing issue (not sure if it's right or not, it's still not working). Anyway, at some point, I propagated permissions to Tobias. This took a while. Afterwards, things didn't seem right. Everyone was at lunch, so I decided to re-boot. It didn't come back up.

After a while, I decided to try again, only switching over to Tobias 2. Same thing, no boot up. I switched to Lindsey. No boot. Something had corrupted the boot records of all three internal system drives. It was going to be a long day.

I pulled the boot drive from my MacPro editor and put it in the server (one big reason I went with a MacPro server, not an XServe...) and started up. It came right up. Tobias & Lindsey mounted. Odd. They'd mount, but not boot. I ran some repair tools. All was fine. Still not booting. I decided to go home for dinner and bring Michael back to save the company. Fingers crossed.

I cloned Michael to Lindsey (I figured Lindsey was shot, so what's the harm). After moving 30 GB of files, I told it to restart using Lindsey. Eureka! It came right up! I cloned Michael to Tobias and we were back in business. Total downtime: Under 8 hours. Not too bad.

Here's what I learned. I'm well prepared for a physical disk failure. A logical disk failure, however, could really leave me up a creek. The boot sector corruption that plagued Tobias had been transferred to Lindsey. Had I cloned Michael from Tobias after the corruption, I'd have been out of luck. I needed a way to snapshot the system drive so I could "roll back the clock." Here's what I do now.

Once a week, I create a 30 GB disk image on Michael named the current date. Once that's mounted, I use Carbon Copy Cloner to clone Tobias to the disk image. Since Tobias only carries about 27 GB of data, I will be able to keep quite a few copies of it, as disk images, on Michael before I run out of room on that 1 TB drive. As more files are added to Tobias, I'll increase the disk image size. Eventually, I have to trash some, but I'll have at least 6 months worth of snapshots.

Lindsey still gets cloned twice a week, just in case. Chances are that drive will never get used, but it cost $52, and costs nothing to back up. I may eventually create four 50 GB images on there and write a series of tasks in CCC to rotate through backups. That would give me one more layer of protection and redundancy.

The upshot for me is this: Drives are cheap and storage is abundant. Keep as many copies of important data as you can. Should this ever happen again, I'll be back up and running in under 2 hours (it takes an hour or so just to copy the data), which isn't too shabby.

Oh, and in case you're wondering, our shared volume, George, is cloned twice weekly to an external 1.5 TB drive called Oscar. Who says geeks don't have a sense of humor?