Apr 18 2023
Data Center

Data Warehouses vs. Data Lakes: How Can Universities Store Large Data Sets?

University IT teams should consider several factors — including where to start — when choosing data management solutions for their institutions.

Every college and university has data storage needs, and student records are just the beginning. IT systems demand storage as well: Internet logs, security events, building systems, security cameras and more all require storage.

But storage is the easy part of the data management problem; the challenge is making use of the data once you have it. In addition to storing data for higher education institutions, data warehouses and data lakes can help make data useful.

There are a few key differences between a data warehouse and a data lake. Here’s what higher education IT leaders should consider when choosing solutions for their institutions’ data storage needs.

Click the banner below to learn how to best upgrade your Higher Ed infrastructure.

What Are Data Warehouses, and How Are They Used?

Data warehouse might sound like another name for a database, and in a certain sense, it is. Both data warehouses and databases store data that is queried in response to a search. However, databases are unitaskers, while data warehouses can answer complex queries using data from many sources.

A data warehouse combines data from several data sets, focusing on the specific information a college or university is interested in. Queries are written to extract the data from the warehouse’s customized data set. University IT staff can run those queries against the data warehouse to generate analytics reports.

For example, if you only want to know Sue Smith’s grade on her biology final last semester, a database will suffice. However, if you want to graph trends in all biology students’ course grades over the past decade, correlated with curriculum changes and student majors, a data warehouse might be a more suitable solution.

DIVE DEEPER: 5 things universities need to know about software-defined data centers.

What Are Data Lakes, and How Are They Used?

Data warehouses map data into a predefined structure before it can be queried, but data lakes are more flexible. A data lake collects all types of data without imposing a structure until the query is taking place.

This approach to data structure makes a data lake ideal for housing vast amounts of data on cheap storage. The trade-off is in data access: Queriers need to know how to access the specific bits of information they are seeking. This puts data lakes in the realm of data scientists, specialists who study data, look for trends or train machine learning models.

Software can also ease the complexity of querying multiple, disparate data sets. If you need to analyze massive amounts of data from many sources, a data lake might be appropriate for your university.

23%

The compound annual growth rate by which global data is expected to be created and replicated between 2020 and 2025

Source: idc.com, “Data Creation and Replication Will Grow at a Faster Rate than Installed Storage Capacity, According to the IDC Global DataSphere and StorageSphere Forecasts,” March 24, 2021

Some Data Warehouses and Data Lakes to Consider

There are many products in the data warehouse and data lake space. Here are a few to consider as you begin evaluating the broader market.

Public cloud provider Microsoft Azure offers several services that combine to form a data warehouse with rich analytics. Azure Synapse Analytics is the key offering in the space from Microsoft, connecting to several data sources, normalizing the data and then running queries. Still, effort is required to create a functioning solution. Microsoft offers the platform, but you’ll need to work with IT experts to build on it.

IT security provider Palo Alto Networks offers the Cortex Data Lake. Cortex is focused on IT security data. A university might find Cortex useful for aggregating security events across campuses into one data lake. Cortex uses artificial intelligence to analyze the data and uncover important security trends.

Each of these options is a platform upon which you can build your own solution, tailored to your institution. These platforms will require significant investment before they can provide the desired insights.

Other data warehouses or data lakes are use case-specific. Such solutions, which address a particular data challenge, can provide value more quickly.

EXPLORE: How some universities are working to build a solid data house.

Should Your Institution Use a Data Warehouse or Data Lake?

Maybe you’re not sure whether to shop for a data warehouse or a data lake. Consider that data warehouses and data lakes are not mutually exclusive. Data warehouses can use data lakes as sources, working together to mine gems in your data you might not have known existed.

That’s the key takeaway: You’re building a data management solution, not merely selecting a data storage option. The solution will be built from all the IT components that allow you to uncover insights.

Far700/Getty Images
Close

Become an Insider

Unlock white papers, personalized recommendations and other premium content for an in-depth look at evolving IT