Disaster Recovery for University IT Administrators
What does a small, liberal arts college in Brunswick, Maine, have in common with a midsize Catholic university on the southern California coast? That question has been asked many times since Bowdoin College teamed up with Loyola Marymount University (LMU) to provide a reciprocal hot-site disaster recovery solution.
While disaster recovery has been of critical interest since the events of Sept. 11, last year's devastating hurricane season provided further evidence to the higher education community of the need to plan for such contingencies.
Despite significant growth in the field of continuity services, the availability of collocation services is being challenged. Increased demand for these services, higher fuel costs and electrical grid capacities that are pushed to the limit have spurred institutions to be creative in formulating disaster recovery solutions.
A Meeting of Minds
Although the Bowdoin/LMU disaster recovery collaboration project spans the continental United States, it began in the middle of America – Snowmass Village, Colo. – at the annual EDUCAUSE Seminar on Academic Computing conference. Mitch Davis, Bowdoin's CIO, was looking for a partner to implement his vision of providing reciprocal hot-site emergency support with other campuses. As the vice president of IT at LMU, I was looking for a disaster recovery solution, and so began the journey.
At first, the differences seemed daunting. The distance between the schools, some 3,000 miles, would mean long trips. The three-hour time difference presented communication challenges. The divergent risk exposures represented another challenge: Bowdoin's primary challenge is dealing with the harsh Maine winters and the occasional ice storm, while Loyola Marymount faces the risk of earthquakes, tsunamis or acts of terrorism at nearby Los Angeles International Airport.
“The beauty of the differences in environment was that we could test the viability of our arrangement,” Bowdoin's Davis says. “If we could overcome those challenges, the model presented a real possibility for other schools that might consider this approach.”
However, the two schools quickly began to focus on their similarities rather than their differences. Mitch Davis' and my management styles and philosophies were closely aligned, which was evident from the highly compatible teams in each institution. Having two complementary technical teams meant increased firepower, depth and sophistication.
“We were able to draw from a different set of skills and experiences that would not ordinarily have been available to us,” Davis says. “It can save us weeks of looking for a solution, because we have another team that may have already solved the problem.”
As the group began to coalesce, trips between the two schools helped develop camaraderie and mutual trust, notes Dan Cooke, LMU's director of system administration. “High-performing teams buy into each other at a more personal level,” he says. “We need to foster that kind of environment.”
Even with compatible teams, however, a successful collaboration depends on having highly congruent technical infrastructures. “A common architecture is essential, because it reduces the training and operating requirements for both institutions,” says Tim Antonowicz, Cooke's counterpart at Bowdoin. To achieve this, both schools have worked to mold their already similar hardware and software environments.
Antonowicz's experience in the deployment of virtual servers at Bowdoin enabled him to assist Loyola Marymount in using this technology. The basic framework of the solution includes Hewlett-Packard's Blade Servers, primarily BL 20s and 25s, with an iSCSI-attached Net App 3020c and R200 storage system. Local backup and recovery are performed with FalconStor virtual tape library (VTL) technology, using a Nexsan SATABeast that houses 40 terabytes in 8U of rack space.
The virtual server environment is based on VMware's ESX Server 3, which enables the blade servers to support nearly any x86-based operating environment. Since a server instance consists of only two files, they can be readily transmitted across a wide area network (WAN) to the campus on the other side of the country. In addition, this small footprint means that very little additional hardware is necessary to support the partner institution's environment.
The network infrastructure is Gigabit Ethernet, with the ability to aggregate up to 8 gigabits per blade chassis. WAN connectivity is via Internet2, which enables end-to-end gigabit connectivity between the schools. This bandwidth is necessary, since data transfers are expected to be quite large. Cooke and Antonowicz are currently evaluating WAN accelerators to handle the load that is created by substantial transmissions of enterprise data.
Bowdoin and LMU began the project with an inaugural videoconference, a meeting that has become a Friday afternoon routine. In addition to the IT project teams, appropriate members of the university staff are brought in to address specific disaster planning issues. Webmasters, communications and public relations staff, and public safety personnel also have participated in these meetings, providing expert knowledge and support.
To further institutionalize the collaboration, a memorandum of understanding was created. Both presidents signed it, enabling the project to move from the realm of the two IT staffs to become a shared initiative at the campus level.
The first order of business was the design of emergency Web sites that could be quickly and automatically activated at each campus in the event of an emergency. Mutual Domain Name System publication had to be considered, as did a method to identify when an emergency situation existed at the other campus. Since such emergencies could be very sudden and dramatic, the systems were configured to communicate and activate without human intervention, if necessary.
E-mail, the next mission-critical application to be brought into the plan, is currently being deployed. Microsoft Exchange 2003 was selected as the common e-mail platform for the schools. LMU had expertise with Exchange and assisted Bowdoin through the implementation.
Concomitant with the development of these virtual hot sites has been a discussion of priorities and processes surrounding disaster recovery. Each school must determine the extent to which financial and student information systems will be needed and accessible in the event of an emergency.
Bowdoin and Loyola Marymount have benefited greatly from this cost-effective, enterprise-class disaster recovery solution. The skill sets of each institution have been leveraged and augmented by the counterpart team at the other school. In addition, the groups enjoy social time together at each other's campus, building trust.
As Joseph Cevetello, senior IT director at LMU, notes, “In any disaster recovery relationship, the key component is trust. In a commercial model, you buy trust through the acquisition of services from a reputable vendor. In this model, building trust is the starting point, since you need to know that the other school will be there for you when you need it most.”
Erin Griffin is vice president, Information Technology, at Loyola Marymount University in Los Angeles. She has 22 years of experience in the management of educational and administrative technology. Her responsibilities at Loyola Marymount include strategic, fiscal and operational oversight of IT.
Building a Disaster Recovery Program
Plan. To build a strong disaster recovery plan, involve all necessary constituents in determining what information to include and to be sure you're accurately reflecting the emergency preparedness plans.
Obtain high-level buy-in. Get senior management to recognize the value of this effort so that it doesn't get marginalized or stalled.
Stick with it. It takes time and commitment to make a collaboration work, so be consistent with your efforts, meetings and project timeline.
Know your partner institution. Understanding the mission and culture of the other school will facilitate the process and will be critical if you need to provide support in an emergency.
Be flexible. Collaborative projects face twice the number of variables: two sets of vacations and holidays, and competing priorities.