How NoSQL Will Make Big Data Work for You
A $3.4 billion market by 2018 — that’s the direction NoSQL technology is headed.
NoSQL databases — which span the categories of key-value stores, column-based stores and document and graph databases — still haven’t surpassed relational database management system powerhouses in popularity, as the DB-Engines Ranking shows. But even big-name players in the traditional relational database management space — think Oracle and Microsoft — are stepping into the NoSQL arena as the technology continues to gain fans in a Big Data world that values flexible data models and scalable architectures.
Websites such as Netflix and eBay and long-established businesses such as MetLife and Honda count themselves among the ranks of NoSQL users — and the technology is poised to make an impact in higher education.
“When you talk about NoSQL, many people think it’s just for very large websites with a need for high-performance scalability,” says Eric Palmer, director of web services at the University of Richmond. But there are other reasons for colleges and universities to look more closely at NoSQL — its “ease of use and fidelity of searching,” for starters, he says.
Palmer uses an open-source NoSQL database (of the document variety) for managing structured data types as XML documents within the university’s public web content. Before adopting the technology about four years ago, Palmer relied on a relational database that required his staff to spend a lot of time creating schemas for tables for various document types.
Now, instead of having to create a new schema every time there’s a new document type, his teams simply create new data definitions, “pop the data in, publish it, write a very small program that outputs it to XML, and put that into the NoSQL database and we’re off,” Palmer says. “We can add new data types in as little as one day.”
Smarter Searches
Palmer brought the system to the attention of his colleague, Chris Kemp, head of discovery, technology and publishing at the University of Richmond’s library. Kemp is also leveraging the tools to drive smarter and more functional search of the university’s collection of digitized historical documents, such as the proceedings of the 1861 Virginia Secession Convention and related documents.
“While search and retrieval of historical documents is definitely not new, the way we’re doing it — using the same NoSQL database technology that drives the public web at the university — is,” says Kemp.
A relational database lacked the flexibility to support the library’s ambitions to allow searching across multiple nodes of XML data. The open-source tools enable capabilities such as a clickable map, on which users tap to see more information about a Virginia county. Everything from its geocoordinates, to the name of the representative for an area during a specific time period, to any speeches that representative may have made at the Convention — all will appear on the map, based on behind-the-scenes searches that are referencing multiple files.
With this first project completed at the library, the backbone is in place to expand the effort to load other Text Encoding Initiative (TEI)-encoded documents into the Extensible Markup Language (XML)-aware NoSQL database.
“We can put things in in almost a content-agnostic way — letters, journals, books, trial documents, newspaper content,” Kemp says. “We can determine on a collection-by-collection basis the nodes for which we're allowing search and manipulation, and index the data accordingly.”
Dive Deep with Analytics
The University of Richmond has been more aggressive than many institutions in leveraging NoSQL, says Dan McCreary, co-author of Making Sense of NoSQL and partner at Kelly-McCreary & Associates, a technology strategy development consulting firm. When colleges and universities do use No-SQL, typically they’re doing so “just as a cheap way to store JSON [JavaScript Object Notation] documents,” he says. Starting with a small project is fine, McCreary says, but “the tragic thing is that often they don’t even know what NoSQL is good at.”
Colleges can harvest and store web and social network data in low-cost, reliable and easy-to-query NoSQL database structures to help with sentiment analysis about the institution, its events or administration. NoSQL databases, McCreary adds, could also play a role in helping universities do deep analytics to identify their best fundraising targets, as just one example.
“Predictive analytics [in fundraising] in the past was simple, based on whether someone gave money,” he says. But a lot of other public and pseudo-public information about every institution’s graduates who might contribute funds can be added into that mix to make fundraising more precise and cost-effective, he says.
NoSQL graph databases may be particularly helpful in querying data collected from multiple sources to look for patterns in how pieces of information in those data sets relate to each other.
The Road Ahead
There will be some challenges in getting NoSQL technology into the campus spotlight, however.
“The challenge with many IT staffs in higher ed is that, without case studies to review or people to talk to, it’s hard to invest in the effort. People are not yet sure that it will prove adequate or better than adequate,” Palmer says.
“There’s a chicken-and-egg problem in that many high-level decision-makers on campus — when they’re thinking about funding a $100,000 or $200,000 project — they realize they don’t have the staff that knows these new technologies,” McCreary adds.
Ravi Agarwal, director of enterprise applications at Wisconsin’s St. Norbert College, agrees that there are very few institutions today with the resources and talent to effectively make use of NoSQL technology. But he expects all of that will change in the not-too-distant future.
“NoSQL really is going to be the force behind the success of Big Data and real-time web apps, thanks to the default characteristics of ease and speed of data retrieval,” Agarwal says. “In higher ed, we are slowly catching up with the idea of Big Data. If we don’t invest in NoSQL soon, we might not really be able to take full advantage of the Big Data movement.”