The Risk of Sharing Sensitive Data with Public AI Platforms
As employees and students embrace generative AI platforms, IT professionals must find ways to ensure that sensitive data isn’t being shared publicly. Users need ways to explore large language models (LLMs) without disclosing any of their data.
“First, we do a data governance check. What kind of data are you going to be using? What are the controls around that data? Then we can design a solution that allows you to keep your data in-house and not expose any data,” says Roger Haney, chief architect for software-defined infrastructure at CDW.
Data governance is key for organizations looking to prepare their infrastructure and users for AI and LLMs.
“We have a workshop called Mastering Operational AI Transformation, or MOAT,” Haney says. “You’re drawing a circle around the data that we don’t want to get out. We want it to be internally useful, but we don’t want it to get out.”
To ensure data security, partners such as CDW can help organizations set up or build cloud solutions that don’t rely on public LLMs. This gives them the benefits of generative AI without the risk.
WATCH NOW: Unlock the secrets the protect Mt. Diablo’s data.
“We can set up your cloud in a way where we’re able to use a prompt to a make copy of an LLM,” Haney explains. “We build private enclaves containing a chat resource to an LLM that people can use without a public LLM learning the data they’re putting in.”
When to Host AI Databases in the Cloud
Organizations’ plans for generative AI will determine how they should prepare their infrastructure for the future of this technology. Haney says most users want to communicate with their data for retrieval or analysis purposes.
“Chatting with your data doesn’t require a new data store. You don’t have to build a huge data lake or warehouse,” he says. “If you have student data, then we add another model that can create the query in SQL, do the query and pull the data back. Then you can ask it questions, using that data as part of your prompt, and you can talk with your data.”
Partners such as CDW can give K–12 organizations this functionality quickly and inexpensively by creating a retrieval-augmented generation database for schools. When asked a simple question, it can return two or three top answers. Often, these solutions don’t require the cloud.
“If you’re going to do 20 queries per second, for example, you probably could do that on-premises,” Haney says. “If you’re going to do 200 queries or, if you’re a company the size of CDW and you’re building an HR bot, 500 queries per second, you want to do that with resources that are scalable. That’s where the cloud comes in.”
Because K–12 schools are largely going to be augmenting instruction and simplifying processes with AI, they could likely host any databases on-premises. Other organizations, however, may consider cloud-based resources.
DIVE DEEPER: When is the cloud right for organizations deploying artificial intelligence?
“With a fine-tuned model, you need heavy GPU resources because now you’re embedding that information into the model itself,” Haney explains. “We do most of that work in in the cloud, where we’re able to rent a GPU or a TPU, and it’s a lot less expensive than buying a huge DGX or other piece of equipment to do that work.”
So, when it comes to determining how you’ll prepare your cloud infrastructure for AI, think first about how you want to use AI, how you want to use your data and what that will require in your organization. Working with an experienced partner can help you answer these questions and more to prepare your district’s digital infrastructure for whatever comes next.