RYAN TALLEY at the Georgia Tech Research Institute says UPS and virtualization technologies help protect the institute’s Electronic Systems Laboratory.

Oct 08 2008

Universities' Disaster Recovery Powers Up

Virtualization and intelligent power management top the list of priorities.

Virtualization and intelligent power management top the list of priorities.

Every college and university plays a game of chance when it comes to the weather forecast. A snowstorm might portend a blizzard, or a spate of rainy days might bring floods and power outages, all of which could shut down campus IT systems. When you have thousands of people relying on an extensive network, contingency planning is essential.

Uninterrupted electrical power and virtualization technologies both offer advantages in the fight to keep IT facilities from being overwhelmed in a disaster. They add durability and flexibility to the mix of techniques, practices and technology that IT administrators draw on to get through tough situations.

The Georgia Tech Research Institute, Bryant University and the University of Iowa recently assessed and revised their disaster recovery plans for a variety of reasons, including record flooding and modernization. All three used some form of new power supply or virtualization technology.

Combine UPS and Virtualization

The Georgia Tech Research Institute (GTRI), the applied research wing of the Georgia Institute of Technology in Atlanta, wanted an enterprise-class UPS system to protect the equipment in its Electronic Systems Laboratory. It had several small UPS boxes attached to some of the servers in its data center, but other servers had no power backup at all.

The lab implemented a $50,000 American Power Conversion Symmetra UPS system in its data center, located on the fourth floor of a six-story building. A 10-ton Liebert cooling system installed on the data center’s raised floor works with spot coolers to supplement the building’s air-conditioning. VMware ESX server-clustering technology virtualized the servers so a hardware failure on one box doesn’t interrupt service or result in data loss. In addition, the lab conducts offsite backup, redundant onsite backup and multilevel RAID configurations for data storage and OS boot drives.

Bryant University reduced power consumption 20 percent and increased server utilization 80 percent with modular data center technology.

Insufficient power provided the greatest challenge to the new system’s implementation. Electricians pulled an additional 480 volt/175 amp power drop up from the basement and then used a transformer to convert it to the 208 volt/350 amps of power required by the new equipment — at additional expense.

Power outages at GTRI are rare, but when they occur, management software alerts the IT staff by e-mail or text messages delivered to their cell phones.

Georgia Tech’s Office of Information Technology defines IT policy in general for the campus, but the GTRI lab has considerable autonomy. “We keep abreast of what’s going on outside the lab, and often take the lead by exploring and implementing new technologies that the whole institute later adopts,” explains Ryan Talley, head of the IT support group for the Electronic Systems Laboratory.

Go Virtual

Virtualization offers a solution to widely distributed systems that contain legacy equipment and disparate elements. “Virtualization can reduce your total server footprint considerably and leverage capacity wherever it exists,” advises Roberta Witty, research vice president at Gartner. That was certainly Bryant University’s experience.

Bryant’s data center capabilities spread across three nonstandard sites that had evolved gradually over the years. The Smithfield, R.I., university’s IT systems hosted 84 separate servers, some running undocumented applications written by long-departed students. The setup was highly inefficient, with an average server utilization of less than 10 percent. The university sought a way to consolidate the servers and reinforce power backup.

“We wanted a purpose-built, enterprise-class, standards-based data center that would be able to support what was coming — such as VoIP, life safety systems, facility access control systems and video surveillance,” says Art Gloster, vice president for information services at Bryant.

The university’s old systems ran during power failures only as long as backup battery power lasted — typically, less than 30 minutes. The servers had to be brought down a few times a year to allow for electrical-system maintenance. Bryant wanted a 24x7 data center that allowed online maintenance and incremental infrastructure upgrades, including battery backup and a generator that could sustain the computer and cooling equipment until regular power was restored.

Last year Bryant implemented new IBM scalable modular data center technology bolstered by APC’s new InfraStruXure architecture for data center uninterruptible power supply (UPS) protection. “The vendors were learning with us,” recalls Gloster. “We were the first organization in this part of the country to install the APC InfraStruXure architecture.”

Using IBM BladeCenter servers and SAN units, the university consolidated and reduced its server count down to 35. APC’s more efficient, closely coupled cooling systems replaced traditional perimeter cooling systems.

Currently, if the cooler in one row is working too hard while the cooler in another row is underutilized, manual intervention is required to fix the problem. Eventually, the new system will be able to automatically move virtual machines from one row to another when necessary.

“Before, implementing new services took several weeks because we had to bring in more power and cooling equipment,” says Gloster. “Lead time is now a matter of days.”

Don’t Take Anything for Granted

The University of Iowa pressed its disaster recovery plans into service this past June, when the Iowa City region recorded the worst flood in its history.

The university had already weathered a “100-year” flood in 1993. The result of that event was a comprehensive flood preparedness plan that convinced officials they could handle anything weather might throw at them in the future.

The June deluge, however, dwarfed the 1993 event, blowing contingencies out of the water. Despite sandbagging, walling off an adjacent facilities distribution tunnel and continuously pumping water out of the basement, the university almost lost the site that served as its primary telecommunications center and secondary data center. The campus lost power on and off throughout the crisis, and had to rely on backup generators and back-feed power to critical servers.

The catastrophic flood fast-tracked plans for a new data center and exposed weaknesses in the old plan, says Don Guckert, associate vice president and director of facilities management. The new hardened facility will have a low profile to make it less susceptible to tornadoes and will be on higher ground to put it out of the reach of floods. The university is also working on plans to back up the existing power plant with a new facility on the opposite side of the river to provide redundant capacity, says Guckert.

Know Who’s Responsible for What and Be Flexible

Gartner’s Witty cautions that organizations often place too much emphasis on data center recovery and not enough on the workforce and how it will continue to operate. That wasn’t the case at Iowa.

Fortuitously, the university had just updated its flu pandemic plan, compiling a list of staff contacts and their backups. As a result, IT management knew who the various system and service managers were and which services were running on which systems. A prioritized list of servers indicated which ones could be turned off first.

“Having those plans scripted and ready to go really helped save time,” says Steve Fleagle, the university’s CIO.

Server consolidation and virtualization similarly proved its worth during the disaster. The IT staff didn’t need to worry about complex interrelationships among critical and noncritical servers as they powered down the latter. Also, there were fewer servers to move and less dependence on specific servers.

The post-flood debriefing is ongoing, say officials. But the IT and operations staffs are buoyed by the performance of the university community last June.

“It was amazing how well people pulled together across organizations and departments,” Fleagle concludes. “This university fosters a lot of relationship building, and people were able to rely on those relationships.”

Many elements contribute to successful disaster recovery; technology plays a critical role. Advances in virtualization can give IT more flexibility and durability. The power that runs the machinery is the bedrock on which all IT operations turn. Successful disaster recovery planners know how to exploit new technology and keep it running at the same time.

After The Deluge

After the Iowa City flood this past summer, some 20 buildings on the University of Iowa campus — as much as 2.5 million square feet — were under water. Among the inundated sites: the school’s power plant, a major research building and the student union. As waters receded, officials sharply revised initial damage estimates of $75 million to more than $230 million. Iowa’s advanced technology lab sustained an estimated $8 million in physical damage and $34 million in damage to equipment.