New ‘AI-Enhanced’ Big Data Discovery Engine Aids Researchers

Semantic Scholar brings a few new tricks to the growing field of computer-aided research tools.

Smarter technologies are converging to make life easier for research scientists, and they could advance the rate of groundbreaking scientific studies.

The latest computer-assisted scholarly research tool is Semantic Scholar, a free offering from the Allen Institute for Artificial Intelligence (AI2). When it launched, in October, it could comb through more than 3 million computer science research papers, with plans to add new subjects.

AI2 calls Semantic Scholar an "AI-enhanced" way to discover information, because it employs machine learning techniques that vastly outperform traditional search methods. Such technology is poised to help scientists surmount their greatest challenges, says AI2's executive director, Oren Etzioni, in a news release.

“What if a cure for an intractable cancer is hidden within the tedious reports on thousands of clinical studies? In 20 years’ time, AI will be able to read — and more importantly, understand — scientific text," Etzioni says. "These AI readers will be able to connect the dots between disparate studies to identify novel hypotheses and to suggest experiments which would otherwise be missed."

AI2 says that in addition to having powerful AI-enhanced searching methods, Semantic Scholar will be able to produce more reliable results than competing scholarly search engines, including the largest — Google Scholar — because Semantic Scholar operates from a more curated list of sources.

Péter Jacsó, an information scientist at the University of Hawaii, says Google Scholar produces documents that “are not scholarly by anyone's measure.” Similarly, a 2014 blog post on Scholarly Open Access criticizes Google’s solution for not being picky enough with the journals it peruses.

Crawling through the expanding array of research papers has been a problem for medical and computer science researchers. The amount of documented medical knowledge doubles every few years — a challenging pace to keep up with. And by 2020, the American Clinical and Climatological Association projects that it will double every 73 days. Such a massive amount of documentation creates untenable overhead for researchers interested in paring the data down to usable amounts.

The results from AI-enhanced research methods can already be seen. IBM has been tackling similar areas of research with its own brand of machine learning: cognitive computing system Watson. In 2014, the company released the Watson Discovery Advisor, tailoring Watson's unique abilities to aid researchers struggling to parse mountains of data.

The Baylor College of Medicine published a peer-reviewed cancer study aided by Watson's analytical abilities. The program evaluated nearly 70,000 scientific articles on the protein p53, ultimately identifying six related proteins — a discovery that has spurred new research, according to IBM. The identification process took a few weeks. Researchers said that without Watson, it would have taken years.

“Even if a scientist reads five papers a day, it could take nearly 38 years to completely understand all of the research already available today on this protein,” says Olivier Lichtarge, director of the Center of Computational and Integrative Biomedical Research at Baylor, in the Baylor College of Medicine News. “A computer certainly may not reason as well as a scientist, but the little it can, logically and objectively, may contribute greatly when applied to our entire body of knowledge.”

Dec 15 2015