Data deduplication can end server sprawl and help IT shops more efficiently manage e-mail and disaster recovery.
Virtual servers, with all the efficiency and easy implementation they offer, turned out to be too much of a good thing for the University of Wisconsin-Madison. The university turned to data deduplication technology when it found the surging number of virtual machines (VMs) and servers was taxing its existing VMware infrastructure.
Like the University of Wisconsin, many schools have found that data deduplication technology can help manage virtual server environments by streamlining data storage and freeing up valuable space.
“We had 210 VMs – 77 were Windows and 133 were Linux,” says Steve Wilcox, enterprise storage team lead for the university's Division of Information Technology.
To handle the increasing server load from its VMware implementation, Wilcox says, the school had to upgrade or replace the six aging EMC Clariion CX 700s that it used for storage.
Deploying VMware brought mounting data storage costs, as space was taken up with redundant copies of servers. Ultimately, the university opted to replace the EMC gear with two NetApp FAS3170 appliances. NetApp incorporates deduplication as part of its core architecture, allowing for broad application across multiple data types, including primary, backup and archival.
“In the VMware world, [deduplication] has saved us a tremendous amount of disk space,” Wilcox says. “We've saved 80 percent of disk space for Linux VMs and 65 percent for Windows.”
Deduplication ratios can vary widely, from 2:1 to 500:1, depending on data streams, volumes and lifecycles.
Source: Storage Networking Industry Association
With the increasing number of virtual servers, disaster recovery was another concern for Wilcox. Without a backup and with multiple servers on a single box, any loss in service would be magnified.
“Data deduplication solved those problems,” he says. The storage appliances (one NetApp appliance is on campus and another off campus for disaster recovery) allowed for more centralized and efficient recovery capabilities
Although data deduplication is used primarily for data backup, as the University of Wisconsin example shows, the technology can also be applied to other areas.
“It's 99.9 percent used for archiving data, where it finds and eliminates repetitive data,” says Noemi Greyzdorf, research manager at IDC. “But dedupe can also help handle virtual-machine sprawl.
“Data deduplication technology doesn't really drive virtual environments, but virtual environments can produce redundant data,” she adds. Eliminating some of the redundant data load is where the technology can help the higher education IT manager, she says.
The University of West Florida (UWF) in Pensacola, Fla., is a case in point. UWF runs VMware to support 190 servers and 50 virtual desktop workstations, says Carl Howell, senior systems administrator.
The university conducts a lot of research for its geographic information systems program, generating a wealth of data about terrain for mapping oil resources. Such data-intensive applications were housed on VMware servers and backed up on servers in Atlanta, says Howell.
NetApp data deduplication equipment allows UWF to save a tremendous amount of disk storage on virtual desktop applications – as much as 90 percent, says Howell. Storage requirements dropped from 1.2 terabytes to nearly 90 gigabytes, he says, once the dedupe equipment was installed.
Baylor University in Waco, Texas, found data deduplication capabilities especially useful when dealing with a rambling Microsoft Exchange application. Before deploying dedupe, the university was using tape to back up MS Exchange, says Tommy Roberson, manager of server operations at the school.
The setup was not only using tape at an alarming rate, it also offered no quick or easy way to restore a mailbox. An entire mailbox restoration could be ruined by a single mistake during the rather lengthy restoration process, says Roberson. One of the most frustrating parts of the tape storage system was the amount of time it took to recover individual e-mails upon request.
Baylor University's Tommy Roberson says deduplication helped wean the college off the tape drive system it used to back up Microsoft Exchange.
Photo Credit: Matt Lankes
“To restore 500 bytes of information, we had to restore 100 gigabytes to retrieve the message,” Roberson says. On top of that, the process was maddeningly slow, requiring 24 to 48 hours to retrieve a message, he says. Because the university received at least one or two such requests a week, an inordinate amount of time was spent on the task.
Tape resources were increasing, too. “We had something like 80 tapes for Exchange storage,” Roberson says. At first, Baylor planned only to upgrade its tape-storage library capabilities to handle its increasingly limited data storage capabilities. “We didn't look for data dedupe. We priced Exchange backup software,” he says. But data deduplication technology offered more capability than a simple tape library upgrade would have, he says.
Baylor decided to install two Data Domain 565 boxes last October to back up its Exchange program. The school also uses EMC Replication Manager software. Between the new data dedupe equipment and the software, recovering an e-mail now takes a fraction of the time that it used to. “What took 48 hours now takes 10 minutes,” Roberson says.
Efficient e-mail backup was also on Coppin State University's agenda in its search for better data storage.
Coppin State, located in Baltimore, has about 4,000 students attending day, evening and weekend classes. In 2002, it began a comprehensive IT infrastructure renovation that included a program to connect on-campus and outlying buildings to its IT and e-mail infrastructure. With a larger pool of e-mail came storage issues, says Ahmed El-Haggan, CIO and vice president of information technology. Increasingly, the university's faculty was using e-mail to share large documents and collaborate on projects, he says.
“There's a hunger on campus” to share detailed information, such as teaching modules and other data-intensive files, says El-Haggan.
“Collaboration among faculty can result in hundreds of copies of the same large document” being stored on e-mail backups. “We just can't keep increasing storage,” he says. Data compression and dedupe play a valuable role in handling the data crunch, he adds, as does educating staff and students about how best to use their allotted storage capacity.
Print servers are another place where data deduplication can help sort things out quickly and without upheaval.
Baylor University backs up its print server once a week using VMware, sending a snapshot of the data on the server to its Data Domain deduplication equipment for storage, says Tommy Roberson, manager of server operations.
Baylor can store 15 weeks of printer backups with the dedupe equipment, an investment that paid off when a print server failed, he says.
When the server's on-board data couldn't be restored, Roberson simply imported a backup copy stored on the Data Domain 565 boxes. The server was up and running within 30 minutes. “What other solution can do that?” he asks.
Although problems implementing data deduplication technology are relatively easy to troubleshoot, there are a few things to keep in mind, say those who have installed it.
“We didn't have any real issues with the process” of installing the Data Domain gear, says Tommy Roberson, manager of server operations at Baylor University. Roberson says his biggest problem was encryption between his Data Domain boxes, which don't automatically encrypt data when it's transferred. Roberson had to install an encrypted telecom line separate from the dedupe gear.
“Some environments don't dedupe well,” says Steve Wilcox, enterprise storage team lead for the Division of Information Technology at the University of Wisconsin-Madison. File types that don't work well include compressed files, video and audio files.