Apr 07 2009

Testing Your Backups

Successful backups and recovery can save an organization from headaches when disaster strikes.

Everyone in the industry knows how important backups can be. They can save an organization money, time and even the job of the network/server admin. But one thing you need to ensure after backing up important files is whether you can actually do a successful recovery. If your backups are corrupt or not functioning properly, and it is time to restore the entire accounting database because of a glitch in the database software, you will be thankful that you do regularly scheduled restores to test your backups. There are too many instances of organizations assuming their backups were working correctly when disaster hits.

First, let me go over my backup system. I backup all of my data to a server that has 20 1TB drives in a RAID 6 configuration. If you set one as a hot spare then you have around 17TB of hard-drive space and can afford to lose two drives at once without data failure. I use rsync scripts to perform the daily backups of all of my servers into a folder for that day of the week. These backups happen during off-peak hours. Every week rsync will look for changes between the previous week's backup and the current week's and sync the two. If you delete a file off your main file server, rsync knows to also delete it off of the backup server. This command is the --delete command for rsync. Once the data is on my backup server, I then send my data offsite in two different ways. I rsync the data to a hosted server that I rent rack space from, and I also copy the data to removable media, which goes to my bank vault.

There are a few suggestions when testing your backups. First, set up some sort of rotation between your servers of when you are going to attempt a restore. Pick a certain database or file server and set up a monthly time of when you are going to run the restore for that server. I try to do some sort of restore for every system at least once a month. Second, you do not have to restore all 2TB of your main file server. Pick a few folders randomly throughout the file server to restore. Typically if a backup goes bad, everything inside the backup job will go bad. Third, do not throw away that old out-of-warranty hardware. Old servers make perfect solutions on which to test your restores. VMware is another option, if you have that available to you.

Another great tip is to test all of your backups, not just the one you use most often. I use three different forms of backup: one onsite and two offsite. If disaster hits and my onsite is destroyed, then I better be sure that my offsite backups are verified and working. I always run a test restore on my removable media before I send it to the bank vault.

I typically keep a Linux and Windows base image loaded on my servers to test my backups. This will allow you to test both MySQL and SQL along with other programs that are OS specific. The hardware really does not matter because most people back up only the data and not all of the OS files. So as long as your data is backed up, you shouldn’t mind rebuilding a server from scratch because you have those resources available to you and they are easy to duplicate. On my main backup server, I have a scheduled job to run during the day that will copy data from the backup server to my test backup servers, depending on which week it is. I am a big rsync fan, so that is my tool of choice to accomplish the copy. Now you can attach that database to your test SQL server or try to access some of the files you restored.

I cannot stress how important this job is. I believe it is just as important as the backup itself and strongly urge anyone that is not testing their backups to start. No one wants to think about something bad happening. But when it does, a little work now can save a lot of work later — or possibly even your job.