Close

See How Your Peers Are Moving Forward in the Cloud

New research from CDW can help you build on your success and take the next step.

Mar 26 2025
Data Analytics

Garbage In, Garbage Out: The Importance of Data Management in Higher Education

Universities must tighten the quality of the data entered into AI models to improve the output generated by tools such as chatbots.

Universities have been cautious adopters of artificial intelligence. As they address data-management challenges while shoring up student retention and boosting enrollment, they need clear, well-organized data to avoid “garbage in, garbage out,” in which poor quality input leads to faulty data output. 

Nicole Muscanell, a researcher at EDUCAUSE, compares bad data in AI models with spoiled ingredients when cooking.

“If you imagine you're going to cook a dish with spoiled ingredients and food that hasn't been managed properly, it doesn't matter how good a cook you are. That dish is probably not going to be very good,” Muscanell says. “The same goes for data. It doesn't matter how good your statistical tools, analytics tools or AI models are. If you're inputting bad data into them, then your output is just not going to be good.”

Click the banner to explore how artificial intelligence can benefit your institution.

 

Universities must consider the data’s source as well as how representative and recent it is, says Tom Andriola, vice chancellor of IT and data and chief digital officer at the University of California, Irvine. If your AI models are trained on poor data, you’ll deliver inaccurate or less than optimal information.

“The fundamental premise of this is, if you throw a whole bunch of data away, you’re not going to get quality outputs on the other end, whether that’s a graphic showing the progression of student success and graduation rates for your school or the response to a question about financial aid through a chatbot,” Andriola says.

A September 2023 Veritas study found that 77% of the data organizations collect is redundant, obsolete or trivial (ROT), or unclassified.

Andriola describes how a university might leave old information on its website regarding housing options. When a student uses a chatbot to research housing, information about newer buildings could be missing.

Generative AI models must be trained to filter out old and redundant data, Andriola explains. He has been helping university staff around campus curate housing data to generate better responses from ChatGPT.

“How does your chatbot know the difference between that and the page you put up last year when you opened the new dorm?” he asks.

He suggests that universities evaluate their source data and how it was curated to ensure AI models are trained on the most recent and accurate information.

KEEP READING: Creative generative AI can help higher education institutions.

Proper Data Management Is Important With Large Language Models

When you use inaccurate data to train large language models (LLMs), unintended breaches can occur with incorrect data in search results, says Jamie DePastino, data governance manager at Carnegie Mellon University.

In areas such as higher education marketing, poor data quality such as duplicate leads, inaccurate attribution or outdated audience segments can lead to wasted ad spending and misleading insights, says Andrew Milner, chief growth officer and principal at Cygnus Education.

“LLMs rely on structured, clean, and well-labeled data to generate accurate and actionable recommendations,” Milner says. “If the data is flawed, the AI will amplify those flaws, producing misleading insights and inefficient marketing strategies.”

Maintaining data integrity can help institutions’ marketing and analytics teams generate precise, data-driven recommendations, he says.

“By consolidating marketing mix performance and applying AI to analyze lead quality, institutions can confidently allocate their budgets to the most effective channels, improving enrollment outcomes,” Milner says.

Marcelo Parravicini, CEO of Cygnus Education, explains how universities struggle to identify gaps and must clean up data before inserting AI into workflows. He describes a scenario in which a university has 100 leads for transfer students from other institutions and may not understand where those leads came from.

“Injecting AI builds a predictive mechanism that does not understand the full journey,” Parravicini explains.

Data Management Challenges in Higher Education

Data infrastructure is not standardized across educational institutions, which can struggle to implement proper data governance and infrastructure for data management. Data governance and data management are broad areas that require “all hands on deck” to implement, says Muscanell.

“Higher ed institutions are doing this juggling act where they’re trying to balance the need for data accessibility with things like privacy regulations and security requirements,” she says, citing budget constraints and limited resources as additional challenges that impact proper data infrastructure and data management.

DIVE DEEPER: Prepare your data infrastructure for artificial intelligence.

Universities want to avoid education deserts, as Daniel Greenstein, chancellor emeritus of the Pennsylvania State System of Higher Education, told EdTech. Institutions should integrate training into learning management systems and find time for stakeholders to complete training, Muscanell says.

Organizations must encourage faculty, staff and students to complete data management training. She also suggests that data management systems be integrated into HR platforms. 

How to Onboard AI With Data Integrity

Maintaining data integrity includes data governance, the structure by which organizations manage which data is used for which purposes. At UC Irvine, a researcher was studying graduation rates across a diverse student population. The project brought up data governance questions.

“Should we use it for that purpose? Sometimes the answer is no; we don't think it’s appropriate for us to share it with that third party. But if the answer is yes, how would we do it in the most appropriate way?” Andriola explains.

Data governance can ensure that quality data is incorporated into AI models.

Universities can conduct training sessions with staff on how to use analytics software. When Florida International University held training sessions on how to benefit from data, it experienced a 10% increase in the school’s four-year graduation rates. A data governance group then examined whether this was the right way to use student data.

“Most institutions have a data stewardship or data governance council that brings together individuals from across the organization to tackle data governance,” DePastino says. “The business of the institution must be the driving force behind any data governance initiative, and that requires collaboration from leaders across the organization.”

One way to ensure that data integrity is maintained is to keep policy data up to date in training data sets, Andriola advises. “If the updated information doesn't get into the model, then the answer you're giving today is incorrect because you actually have more recent information that the model doesn't know about,” he says.

WATCH NOW: A data-driven institution shares its top lessons.

By boosting data integrity, educational institutions can rebuild trust in higher education. “Data integrity is one of the ways institutions could restore or rebuild that trust. Keep accurate data and make better and more informed decisions surrounding what’s going on at institutions,” Muscanell says.

DePastino recommends strong data-quality standards to avoid data in a model leading to bias or errors. 

“Any known data quality issues should be documented and made available to the university community so it is known where there may be a potential issue,” she says. “Further, there needs to be a mechanism to report any issues found to determine mitigation steps.”

Milner advises that higher education institutions ensure data is deduplicated, standardized and free from inconsistencies before feeding it into an AI system. “They should also regularly audit data sources to identify and address biases that could skew campaign targeting and insights,” he says. “Institutions must also use automation to refresh and update data in real time, ensuring that AI-driven insights reflect the latest trends in student engagement.”

gremlin/Getty Images