Web Exclusive I
Managing the storage growth game gets more challenging.
The typical university will see its data storage requirements double within the next eighteen months, according to Gartner, and it's clear why this is happening, thanks to regulatory requirements and other pressures.
Ken Woo, director of IT and facilities for Northwestern University School of Continuing Studies, predicted even higher growth. “It's looking like even more than doubling, [since] we're introducing GIS, CAD and high-resolution type initiatives. At the med school, they're looking at 3D modeling of DNA and chemical formulas. That requires a lot of storage capacity. With GIS, there's another storage need there, too.”
University IT departments usually manage a multitude of high-volume file types, including multimedia, databases housing student information and applications, and modeling and engineering data. That's in addition to the e-mail and file-serving albatross that continues to grow.
Storage growth has fueled a multibillion dollar storage industry. With so much attention directed at storage growth lately, several products have emerged that promise to ease the growth burden while simplifying storage management across the enterprise. Among the most popular technologies are:
• Data archiving
• Hierarchical storage management (HSM)
• Storage area networks (SAN)
• Network attached storage (NAS)
• Storage virtualization.
Data archiving has gained in popularity in recent years as a means of retaining data while not increasing an organization's amount of online storage. Archiving works by identifying files that have not been accessed for a specified amount of time and moving them to offline storage, such as tapes. Archived files are retained in a searchable index so they can be retrieved if needed.
Archiving is also a popular tool for quelling the growth of e-mail servers. To comply with laws such as Sarbanes-Oxley that require the retention of electronic documentation, organizations often turn to e-mail archiving products. Products such as Veritas Enterprise Vault allow users to archive e-mail, file system and even SharePoint data, and manage the archived data with a single console. For e-mail, the use of archiving alleviates the need for users to export mailbox content to a .pst (personal folder storage) file, for example. Storing exported .pst files locally on workstations rather than on a file server could result in lost e-mail as a result of workstation failure. This assumes that user workstations are not backed up nightly. Instead, archiving allows the user to centrally control and manage both mailbox and file server growth.
For all of its merits, archiving files or e-mail data to offline storage does add additional management considerations to the storage infrastructure. For example, if tapes are shipped offsite for long-term retention, users will experience a delay when they need to access a stored file. To alleviate this, some organizations retain a copy of each archive tape onsite as well as a duplicate tape offsite for disaster recovery protection.
Help from HSM
Hierarchical storage management, also known as data migration, provides another alternative to archiving. Like archiving solutions, HSM involves the migration of infrequently used files to tape. However, it differs in that migrated data will remain available in a nearline capacity, such as in a tape library. Architecturally, HSM operates very differently from archiving tools. When a file is moved to nearline tape storage, the original file (on a file server, for example) remains on the server. But instead of containing actual data, the file will include a reparse point, or pointer, to the actual file location in nearline storage. When a user or application requests the file, it is automatically retrieved from nearline storage by the HSM product.
Although recalling a file from a nearline tape can take several minutes, the fact that the process is automated by the HSM tool means that recalls operate relatively transparent to the end user and will not require a call to the help desk. Once deployed, users will need to be trained to recognize the presence of migrated files and understand that their recovery can take several minutes. Without this level of understanding, the average user will repeatedly attempt to open the same file and may generate several file recall requests in the process. OS vendors have stepped up and made it easier for users to recognize migrated files. Windows 2000 and 2003 Server operating systems, for example, include an API that allows migrated files to be identified by a small clock symbol embedded in the file's icon. This allows users to recognize that a file has been migrated, helping them understand that opening the file will incur a delay.
Like archiving, data migration tools also allow users to migrate e-mail messages, so they can limit the growth of their mail server's online storage.
Help from the SAN
Although Northwestern isn't currently using either archiving or HSM as a way of limiting online storage growth, Woo notes that several departments have recently deployed SANs. By using a SAN, a separate network can be dedicated to the transfer of data for both backup and archiving purposes. With increasing online storage requirements, coupled with a stagnant backup window, traditional LAN-based backups have become difficult to nearly impossible to do. To stay within their backup windows, some organizations have changed their backup schedules to run full backups less often.
However, for crucial data involved in, for example, scientific or engineering research, fewer backups are not an option. Instead, implementing either a Fibre Channel- or iSCSI-based SAN can offer data transfer speeds up to 4 Gb/sec and add an abundance of new backup possibilities, including:
• Server-free backups
• Serverless backups
• Block-level snapshots.
With a server-free backup, the designated backup server can mount and back up a target server's files through the SAN. This approach is known as server-free because no CPU cycles are consumed on the actual file server being backed up. Instead, all I/O for the backup job is handled by the backup server itself.
With backup software and SAN devices (such as a Fibre Channel switch) that support SCSI-3 Extended Copy (X-Copy), the backup software can send an X-Copy command to the SAN switch to initiate a copy of a SAN volume to backup storage media. With this approach, no server is involved in the backup path.
Block-level snapshots of a volume's data are supported by nearly all major file-serving and database applications and offer the ability to create repeated block-level copies of data. This can give users the flexibility of running the equivalent of a backup multiple times per hour, if desired. While this is also possible with third-party applications by using a SAN, many of the larger NAS vendors such as EMC and Network Appliance support this capability.
A challenge for SAN managers is that separate departments in an organization often deploy their own SAN solutions. These solutions can involve point-level products that may not be manageable across a university's campus network. Instead, IT managers are faced with having to document and organize individual solutions, which usually results in higher total cost of ownership.
Woo also recognizes this problem, stating “[We need] to leverage our contact with the university as a whole to create one best pricing for us. Managing data throughout decentralized departments has been a problem.”
With departmental preferences and politics, this can be difficult and thus result in increased product and management costs throughout the network. In addition, with critical data stored on a SAN, nearly all components in the data path throughout the SAN must be redundant. Otherwise, the failure of a SAN switch could result in several servers being without access to storage, and thus unavailable.
Fueled by storage growth, SAN continues to emerge as a popular option in aiding data management. Since any server attached to a SAN can potentially access any storage device on the SAN, backups can run significantly faster. For example, 10 servers on a SAN can all write backups to a single shared library. With traditional LAN backups, a single server would act as the interface to the library, and all other servers would have to send their backup data over the LAN to the “media server.”
With the SAN acting as a shared network for storage, adding new storage to a system is as simple as adding it to the SAN and then mapping the disk's logical unit number (LUN) to a server. This prevents users from having to guess the amount of future storage that will be required by each server. In these situations, the system almost always ends up with some servers having overallocated storage while others are underallocated.
It's clear that implementing a SAN can offer significant advantages toward controlling and managing storage growth, as well as staying within backup windows as storage continues to grow. For those looking for a simple solution to file server growth, network attached storage is often the answer. Purchasing a NAS lets users acquire a large (up to several TBs) file server that offers Common Internet File System (CIFS) and Network File System (NFS) support that can be deployed within minutes. Most NAS appliances offer both redundant power and storage, ensuring high availability. Many NAS appliances also support Fibre Channel interfaces that can allow them to access shared storage in a SAN. This allows for simple scalability of NAS server online data growth.
While they offer several benefits, NAS solutions often come from proprietary hardware vendors, so users should make sure that the NAS will be compatible with existing infrastructure products, such as backup software.
It's also important to ensure that NAS appliances can be centrally managed, especially if additional appliances might be added down the road. With centralized management support, networking tasks such as software updates and security management can be performed centrally, instead of individually on each NAS.
Many network managers are considering virtualization these days. It comes in many forms, with virtual machines (VMs) powered by VMware being the most popular. Using VMs, a university can consolidate several servers into individual VMs that run on a single physical server. Since many application vendors prefer their products to run on a dedicated box, the hardware resources of each server are often underutilized. In these situations, consolidating to VMs is ideal.
“Space is a premium at the university,” Woo notes, adding that he is currently considering virtualization products. “As files and servers grow, we don't want to keep expanding the hardware. Space is too costly to keep throwing servers at it.”
Storage vendors now provide virtualization technologies that allow users to abstract the physical location of data from the management process. For example, instead of having to know which tape a file is on in order to restore it, users can simply select the file from the backup management tool and click the “restore” button. The backup software determines the file's location and prompts the user to insert a tape into the tape library, if needed. Since the sheer volume of storage media continues to rise with the growth of online storage, having backup products such as those by CommVault and Veritas that track storage locations can have a significant impact on data management.
Winning the Game
Winning the game of storage growth can be achieved with a few simple guidelines. There is no one-size-fits-all solution to storage growth, although vendors may try and tell you otherwise. To successfully manage growth, consider all current storage solutions as pieces of a complex puzzle. When each technology is used as intended, both the growth and availability of online storage resources can be much more manageable.
Enterprise-class storage management vendors such as Veritas and CommVault now offer storage management and backup solutions compatible with many operating systems, applications, and SAN and NAS solutions, so users can have a much better awareness of storage resources. Also, using policy-based storage virtualization that offers a point-and-click approach to data recovery may lead to a significant reduction in TCO along the way.
Of course, if politics and budgets are a problem, consider borrowing a line from Woo's playbook: “We have a disk-hog list [and use it to] make people feel ashamed that they're taking up too much storage.” Users receive an e-mail if they're “one of the top five people taking up storage. Being ashamed, they go out and clean up their disk space!”
Chris Wolf is a writer based in Midlothian, Va.