“In our primary cluster, we have three all-flash or AFF NetApps, and those are our primary storage. We also wanted a slower, cheaper tier, and the StorageGRID cluster gives us that, with direct S3 on-premises storage,” says Green.
FabricPool delivers the auto-tiering, a key capability for managing the expense around massive data storage.
“If data is not actively being used on all-flash, it automatically goes to the slower S3 storage. Then, if you start using it again, it pulls it back into the flash,” Green says. “That was a key requirement, to have all-flash for active workloads along with some cheaper storage to drive down that cost.”
The NetApp solution fulfills another key need: It offers multiple storage protocols.
“We have a wide variety of people who access the data, so we need to be able to provide their data in S3 or CIFS or NFS. Whatever they want, we can do,” Green says. “It’s vital that the scientists, engineers and researchers have access to the data however they need it. That multiple protocol support — and how it handles permissions between all those different systems — is critical.”
MORE ON EDTECH: To improve higher ed data security, address these risks in research projects.
A Storage Solution That Captures Sounds of the Earth
While LASP draws its data from the skies, other researchers work closer to home. Bryan Pijanowski, director of the Discovery Park Center for Global Soundscapes at Purdue University, is gathering the diverse sounds of our planet, from the creak of a glacier to the croak of a frog.
His project, Record the Earth, involves a study of the soundscapes of every major biome in the world. “There are 32 of them, and I’ve done 27 so far,” he says. “In terms of data, I’m in the petabyte range right now, somewhere around 4 to 5 million recordings.”
In addition to struggling with the size of the data, Pijanowski’s previous storage solutions were fragmented and cumbersome. He needed a place to consolidate field data so he could perform analysis and calculations.
RELATED: To effectively manage higher ed data, address sprawl.
“We had the best of the best, all the different pieces we needed, but they were not well integrated,” he says. “The hardware solutions, the software solutions, the people solutions were all siloed. As a result, we were getting stuck all the time.”
For a solution, Pijanowski turned to Hewlett Packard Enterprise. With Edgeline Converged Systems and ProLiant servers on the front end of the process, HPE helped the center build an environment to usher the data from ingestion to visualization via HPE Apollo systems. Files are loaded onto a server and distributed using a combination of Apache Hadoop, Spark and Kafka before landing in a MongoDB distributed database. As a final step, files move to Tableau for data visualization.