Built-In Dedupe

Data deduplication adds value to backup software by relieving bloat.

June 2010 E-newsletter

Systems Survival

Built-In Dedupe

Acronis Backup

Better E-Mail Backup

Why SANs Makes Sense

With only about 1,250 students, you might expect the Upper Dauphin Area School District in rural central Pennsylvania to have fairly limited backup needs. But as a pilot site for the Schools Interoperability Framework, an information-sharing specification created to standardize the exchange of educational data among schools, Upper Dauphin runs dozens of servers.

“We have many servers, and backing them up was expensive, time-consuming and didn't address the problem of offsite storage,” says Bryan Campbell, technology director for the district. “It wasn't in our best interest to implement a traditional backup option.” Instead, Campbell deployed Barracuda Networks' Backup Service, a subscription-based backup solution that combines a local backup appliance with cloud-based offsite storage.

Barracuda's backup system is one of a growing number of software packages that speed backup and recovery with data deduplication. Today, in addition to Barracuda, nearly all major backup software packages offer deduplication, including Acronis Backup & Recovery 10; BakBone NetVault: Backup and NetVault: SmartDisk; CA ARCserve; and EMC Avamar.

Eliminating Redundancy

Deduplication is a relatively new technology, but the principle is fairly simple. In every organization, there are pieces of data that are repeated dozens, hundreds, even thousands of times across all the files stored on a network. These could include whole files – such as a memo sent to everyone in the organization and saved to every hard drive on every computer – but much of the replication occurs within files; for instance, a signature block appended to every outgoing e-mail or a logo embedded in every PowerPoint.

Rather than save these scraps of data over and over again, deduplication scans every file for redundancy and replaces repeated data with a pointer to the original. “It's like a bouncer at a club,” says Mike Fisch, senior contributing analyst at The Clipper Group. “To get in, you have to be original.”

Deduplication offers a number of benefits when integrated with a backup strategy. First, it reduces the size of individual backups by eliminating redundant data. It also reduces the storage capacity required for subsequent backups because today's backup image likely shares much, if not most, of its data with yesterday's. With deduplication, backups can store exponentially more data over time than the actual space they take up. “You can easily get to 20 times, or even 50 times [the amount of data],” says Fisch. “So you can backup a lot more data to disk.”

Because deduplicated backups are smaller than traditional backups, they can be run more quickly and be transferred over the network or to offsite storage easily. That means lower bandwidth and overhead consumed by backup, and less time lost for recovery. “I used to have to run backups at night or over the weekend because they used too many server resources,” Campbell recalls. “If there were server problems, or if the tape was full, the backup would stall. Now I do multiple backups throughout the day, without the overhead.”

The Clackamas Education Service District in Oregon discovered the same benefits after deploying EMC's Avamar backup software. Unlike tiny Upper Dauphin, CESD serves 10 school districts across Clackamas County, comprising 103 schools and more than 57,000 students.

Gary Scheel, network operations coordinator for CESD, had trouble creating a standard for offsite backup in all of those schools. “In one case, a district wasn't doing any backups at all due to the amount of data they had and the cost of implementing a solution,” he says.

6.6 months
Average time in which a data deduplication system pays for itself in reduced storage needs, improved IT productivity and shorter backup and restore times
Source: IDC, 2010

Using deduplication allows Scheel to minimize the amount of storage needed for backup, saving money and time as well as network resources. “Deduplication and compression allows us to perform backups over the network in a reasonable time frame,” he says, “as well as to minimize the amount of storage space needed.”

Deduplication is catching on for good reason – it saves time, money and hassle, things that overburdened school IT workers can greatly appreciate. Instead of worrying about whether there's a new tape in the drive, or whether there's enough time to run a backup without limiting everyone's ability to function, Upper Dauphin's Campbell can focus on helping people. “It's about the data, not the system.”

Make the Most of Data Deduplication

1. Keep backups longer: The more backups you have, the more likely you are to find the redundancies that makes deduplication work. Subsequent backups will get smaller and smaller – incidentally freeing up the space you need to keep more backups.

2. Know your data: Video, photographs, scanned documents and audio tend not to yield much gain. If much of your backups comprise these types of file, consider bypassing deduplication to save overhead.

3. Don't encrypt or compress before dedupe: Encryption and compression eliminate many of the patterns that deduplication looks for. Apply them after the data has been deduplicated.

4. Dedupe as widely as possible: The more data deduplication has to work with, the more opportunities for finding redundant data, so include as many computers, servers and virtual devices as possible in your backups.

5. Let the stats worry about themselves: While it is satisfying to see you are saving 80 percent, 90 percent or even more of your disk space by using deduplication, don't worry about improving your ratio. Configure your software in the way that works best for you and don't worry about what the statistics say.

May 05 2010

Sponsors