When it comes to disaster recovery, higher education IT managers are faced with stark options. Commercial disaster recovery contractors are the typical choice. However, such a choice often boils down to working with a specialized, experienced but wildly expensive commercial team.
Complicating the commercial option further are the time constraints required to fix problems. Some commercial firms typically allow testing only on weekends and not more than twice a year. That makes the window for fixing bugs incredibly short and can lengthen the time it takes to get the systems back online after a disaster, explains Bill Lewis, data center manager for the Office of Information Technologies at the University of Cincinnati (UC).
Because of those limitations, universities and colleges are looking to each other for a better solution, pooling their hardware and real estate assets toward a common goal. Increasingly, universities are searching out other institutions to help shoulder their disaster recovery load.
In 1999, Ohio launched a statewide fiber intranet highway effort called ITEC-Ohio (part of the Internet2 national consortium aimed at developing advanced Internet technologies, including high-speed fiber optic networks) that connected all educational institutions in the state, K–12 and above. With ITEC (Internet2 Technical Evaluation Center), John Ellinger, then acting CIO of the Office of Information Technology at Ohio State University (OSU), saw an opportunity.
In late 2002, faced with enormous monthly costs from a commercial disaster recovery specialist, Ellinger met with the University of Cincinnati’s Office of Information Technologies CIO, Fred Siff, and proposed a cooperative disaster recovery initiative between the two schools. OSU and UC have much in common: They are the first and second largest schools in the state of Ohio, respectively, and have similar size mainframes, comparable hardware and excess floor capacity in their IT operations. They were also familiar with each other’s business models.
The first thing the team did was line up funding. In 2003, the Ohio Board of Regents was stressing collaboration between universities and was granting funds to schools that worked jointly on projects. “We ended up securing a grant for $200,000 and change,” says Jim Robinson, technical software specialist and team leader at UC.
“Ellinger went to the board and sold the idea from the standpoint of being able to save money doing this because it could be done for a lot cheaper than going to outside firms,” says Bruce Boda, senior systems manager of the Office of Information Technology at OSU.
“The average minimum SunGard DR installation is $100,000 for an educational institution of our size,” says Boda. “It ranges from $100,000 to $300,000. In comparison, our DR installation with UC will cost about $20,000 for the first year and then somewhere between $2,000 and $3,000 every year after for annual maintenance.”
Boda predicts they will see a return on investment three times over within the first year.
Both teams were ready to hit the ground running in 2003, but delays with the statewide fiber highway put the project on hold. Last year, enough of the intranet was completed to bring the UC/OSU disaster recovery project back to the fore.
“One of the things that we did was a phased approach,” says Clarence Smith, assistant director of Data Center Operations at UC. “The very first piece we took on was backing up each other’s mainframe systems.”
Because both universities run their student information systems on z/OS on the mainframe, they were able to run each other’s systems at each site without spending much money.
Boda adds that the physical hardware deployment “was actually a lot easier than we thought it would be. The actual configurations were extremely simple. We thought that the integration of the devices and networking environment was going to be difficult, but that only ended up taking about a day, where we had scheduled over a week to complete it.”
One of the unexpected hurdles the teams faced was working in the different storage environments — EMC doesn’t play well with IBM storage, nor does Hitachi play well with IBM and EMC. They all have specific methodologies for replication and for how they talk to different disc environments, according to Boda. “That’s where the [protocol converter] device came in, because it doesn’t care who it’s talking to,” he says. “So long as it can present FBA storage, it can write to it. From the mainframe environment, that was nice because we could write to anything.”
Another technical hurdle the teams encountered was trying to pass the different hardware definitions at both schools. “We got around that by utilizing z/VM,” says Jim Naughton, storage manager at UC.“We were able to define our MVS machines and z/OS machines as virtual machines. That way, OSU could take their addresses and map them to our devices, and we were able to map our devices to OSU.”
“Above all, probably the biggest problem we ran into was trying to run in simulated mode once we had everything connected,” explains Robinson. During an early trial run, they discovered that the system was hitting sites outside of the university. Batch jobs would try to shift files to banks and retirement systems. “Fortunately, we were able to cut it off before it got too bad,” says Robinson. “Once we got the system up and running, we were able to find out just how many automated procedures try to connect outside our little universe.”
To correct the corruption, Lewis and his team shut down outgoing e-mail and FTP servers on the disaster image, and OSU put a few holes in its firewall to allow a connection. UC had to similarly reconfigure its firewall.
Phase one was completed about a year ago, according to Robinson. The teams are now focused on phase two, which will back up the open systems, he says. “Rather than jump into everything all at once, we took small steps,” he adds. “We did the mainframe first because we were familiar enough with the people on both sides that we were able to support each other. And without too many gray hairs, we were able to get the Ohio State system running down here and our system running at Ohio State, and that helped sell the idea to other people that this was a practical way to back each other up.”
For the second phase of the project, Robinson and his team are working on establishing procedures for transferring encrypted data over Internet communication paths (in place of tape backups). Using this strategy, critical data will always be current (within a few hours) without the universities’ need for lengthy tape restores or forward recovery of applications.
Aside from the cost savings, the universities also see zero downtime. “Last year, we had electrical work done in our computer room,” says Smith, “and we had the power out for about 8 hours. We switched our web presence over to Ohio State because the networking equipment and the servers were all shut down here. So anybody trying to connect to the university [at that time] would be able to get a home page explaining why they couldn’t get any further than that,” he says.
“If it had been a real disaster, we would have been able to update that page with contact information and anything else people would have needed,” he adds.
Hard at work on phase two, Smith says, “I am happy with the progress and success of this project thus far, and I look forward to seeing the advancements our teams make going forward.”
The Ohio Statewide Fiber Highway
Known as ITEC-Ohio, the statewide fiber highway is a fiber-optic network that rings the entire state and connects all of its educational institutions in an expansive, low-cost high-speed intranet. Started by Gov. Robert Taft in 1999, ITEC-Ohio is scheduled to be fully operational and connected to every school in the state by the end of summer 2008. “Whether everybody chooses to participate with DR, that’s their own choice,” says Bruce Boda, senior systems manager of the Office of Information Technology at OSU.
Currently, all of the 13 major colleges in the state are involved in the disaster-recovery project, with OSU and UC leading as key points of the backbone. Not all of the schools are doing replications, however. What’s great about this initiative is that other schools can participate at fairly low cost. “We’ve been able to keep it down to around $20,000 as an entry fee,” explains Boda. “At first, each of the schools that wants to get involved need only contribute what hardware they need to get connected, be it a Cisco switch or storage. So cost, in the grand scheme of things, is actually quite minimal.”
But what about universities that don’t have a statewide fiber highway? As Boda explains, “It’s definitely still possible to have a DR program, because actually, the minimum network requirements could be handled by any of the major ISPs. But there again, there’s a cost factor. [ITEC-Ohio] is a minimum cost to Ohio education institutions, where universities without an [Internet2] network would have to buy that bandwidth from a private contractor. Any state could do it. Any institution could do it. They would just have to buy the bandwidth. And at minimum, you would need a T3 line.”
- FBA: Fixed-Block Architecture
- z/OS: The successor to MVS, z/OS is a 64-bit operating system for mainframe computers, created by IBM and first released in 2000
- z/VM: The latest OS in IBM’s VM family of virtual machine operating systems
- MVS: Multiple Virtual Storage, the most commonly used operating system on IBM mainframe computers from the 1970s until last year, when IBM discontinued support for the OS
University of Cincinnati Tech Roundup
- Network: Cisco Gigabit EtherChannel tunneling
- Data Centers: Innovation Data Processing’s FDR products, IBM 3590 tape drives and Sun StorageTek T9840 tape drives
- Software: IBM z/OS and z/VM, VMware, Lightweight Directory Access Protocol (LDAP), eDirectory and Active Directory
- Storage: EMC CLARiion and Symmetrix, IBM DS6800 (mainframe) and IBM DS4800 (open systems)
University of Cincinnati’s and Ohio State University’s ’s storage environments differ in size. The average production environment for a mainframe is 1 terabyte, but the size of open systems is much greater. “Our production environment for our open systems is about 100 terabytes, and the whole infrastructure sits on about 10 square feet of floor space,” says Jim Naughton, storage manager at UC “OSU’s is slightly larger than that.” In general, it takes about 12 hours to replicate a terabyte through [ITEC-Ohio] at a constant speed of about 24 megabytes per second. But to save time, most of the schools do an incremental daily backup, rather than a full backup once a week. “An incremental is just what we consider to be changed data, and that’s significantly less,” explains Naughton.