Necessity really is the mother of invention. At least that's the case at the University of Northern Iowa in Cedar Falls, where the need for off-siting of backups finally convinced the IT department to move away from tape backups.
"We used to be decentralized, back up everything to tape, and then take those tapes offsite to a bank vault. With reductions in our operations staff, we didn't have the people to do that anymore," says Seth Bokelman, senior systems administrator at UNI.
Centralizing backups solved the IT staffing issue at the 13,000-student university, but it introduced other challenges — namely, heavy traffic loads as data was replicated between two main sites across 90 miles. "We needed to support 26 terabytes of backups using existing connectivity," he says.
The solution was deduplication with ExaGrid System's EX13000E backup appliance. Deduplication eliminates storage of redundant data at the block or file level, shrinking backup windows and disk space and bandwidth requirements. The technology is critical for colleges and universities that are working hard to manage decreased infrastructure budgets while retaining more student, faculty and staff data.
"Consider deduplication when gaining more storage space is your main objective, or if you need backups to take less time," says Greg Schulz, founder and senior adviser to the Server and StorageIO Group.
EMC, Fujitsu, IBM, NetApp and Quantum all offer deduplication in their backup storage systems.
Bokelman says UNI deployed two ExaGrid appliances at the primary site and one at a disaster recovery site in Ames, Iowa. The ExaGrid configuration has let the IT staff store 27TB of data on only about 3TB of disk space. Bokelman credits selective deduplication for the dramatic results.
"There are some applications and data, such as video, that are inefficient to store, so you have to be careful what you use deduplication resources for," he says. For instance, the IT department uses dedupe only for student systems, such as assignment data, official documents, IS data and other formats that are likely to spawn multiple copies.
Bokelman warns his peers not to consider deduplication as a less expensive alternative to tape because, in some cases, it can cost up to three times more. "Instead, it lets us do things we couldn't do with tape, such as automatically replicate to a distant site," he says. Other benefits include the ability to run more concurrent backup and restore jobs, elimination of tape management and rotation, and increased reliability from no longer having to move tape around.
Justify the Expense
Eric Hawley, CIO of Utah State University, agrees that the benefits of deduplication can justify the expense. The 29,000-student institution, based in Logan, was able to reassign a full-time IT staffer in part because deduplication relieves the data center staff of tape management responsibilities.
In 2008, the university was maxing out its 30TB storage area network. Adding another SAN would have increased complexity and required more personnel. To simplify overall management, the IT department switched to EMC's Data Domain DD670 Deduplication Storage System to minimize used disk space. While the university currently has just under 484TB of stored data, only 54TB of disk space is actually being used.
Like Bokelman, Hawley has been choosy about the data that gets deduped. "We look for what's going to give us the biggest bang for our buck," he says. Video files typically are not deduped, nor is institutional data such as scanned documents, which don't have enough identical copies to make the potential process overhead worth it. More likely candidates are e-mail, central file stores, enterprise resource planning backup sets and registrar data.
Deduplication has helped USU keep more data on disk before it's transferred to tape. Previously, the university could retain only a week of backups. Now, certain backup sets never have to be copied off because there is plenty of room on the disk.
The biggest advantage: "It's a lot easier to manage a single data domain box than it would be to manage two or three SANs," Hawley says.
At Baldwin-Wallace College in Berea, Ohio, deduplication is so helpful that they use it for both primary and backup storage. The college performs centralized backups between buildings situated across campus from one another. Using NetApp's FAS 2050 appliances, the IT department can store a month of backups. "If we turn dedupe off, that span narrows to a week or two," says Greg Flanik, the college's CIO.
Flanik says deduplication is worth the investment when institutions can realize a 40 to 50 percent improvement in available disk space across the board. "If that doesn't happen, we don't dedupe that application," he says. For example, IT staff found, via vetting, that some admissions databases weren't good candidates for deduplication.
Baldwin-Wallace also was able to pare down its backup window. "Without deduplication, we'd bleed into peak usage and infringe on network performance," he says. "Shortening that alone cost-justifies deduplication."