Scientific literature reviews are a critical part of advancing fields of study: They provide the current state of the union through a comprehensive analysis of existing research and identify gaps in knowledge that future studies might focus on. However, writing a well-done review article is a great thing.
Researchers often comb through stacks of scientific papers. They must select studies that are not out of date but avoid recency bias. Then comes the intensive work of evaluating the quality of studies, extracting relevant data from the works that make the cut, analyzing the data for insights, and writing a compelling story that sums up the past while looking to the future. Research synthesis is a field of study in itself, and even excellent scholars do not need to write excellent literature reviews.
Enter artificial intelligence. As in many industries, a number of startups have emerged that use artificial intelligence to speed up, simplify, and revolutionize the process of evaluating scientific literature. Many of these startups are positioning themselves as AI search engines focused on scientific research—each with distinct product features and target audiences.
Elicit encourages searchers to “analyze research articles at superhuman speed” and highlights its use by expert researchers at institutions such as Google, NASA and the World Bank. Scite says it has built the largest citation database by continuously monitoring 200 million scientific sources and offers “smart citations” that categorize findings into supporting or contrasting evidence. Consensus has a demo homepage that seems aimed at helping laypeople better understand the question, explaining the product as “Google Scholar meets ChatGPT” and offering a consensus meter that summarizes the main findings. These are just some of the many.
But can artificial intelligence replace a high-quality systematic review of the scientific literature?
Research synthesis experts tend to agree that these AI models are currently great to excellent at performing qualitative analyzes—in other words, creating narrative summaries of the scientific literature. Where they are not as good is the more complex quantitative layer that makes the review truly systematic. This quantitative synthesis usually involves statistical methods such as meta-analysis, which analyzes numerical data across many studies to draw more robust conclusions.
“Artificial intelligence models can be almost 100 percent as good as humans at summarizing key points and writing a flowing argument,” says Joshua Polanin, co-founder of the Center for Methods of Synthesis and Integration (MOSAIC) at the American Institutes for Research. “But we’re not even 20 percent of the quantitative synthesis,” he says. “True meta-analysis follows a rigorous process of how you search for studies and quantify the results. These numbers are the basis for evidence-based conclusions. Artificial intelligence is not closed to doing that.”
Difficulty with quantification
The quantification process can be challenging even for trained professionals, Polanin explains. Both humans and AI can generally read a study and summarize the findings: Study A found an effect or Study B found no effect. The tricky part is putting a numerical value on the range of effect. What’s more, there are often different ways of measuring effects, and researchers must identify studies and measurement designs that are consistent with the premise of their research question.
Polanin says that models must first identify and extract the relevant data, and then perform the nuances of how to compare and analyze it. “While we try to make decisions ahead of time as human experts, you may have to change your mind on the fly,” he says. “That’s not something a computer will be good at.
Given the hubris that exists around AI and within startup culture, one would expect the companies building these AI models to protest Polanin’s assessment. But you won’t get an argument from Consensus co-founder Eric Olson: “I couldn’t agree more, honestly,” he says.
As for Polanin, Consensus is intentionally “at a higher level than some of the other tools, giving people the background knowledge for quick insight,” Olson adds. He sees the lead user as a graduate student: someone with an intermediate knowledge base who is working to become an expert. Consensus can be one of many tools for a true subject matter expert, or it can help non-scientists stay informed—such as a Consensus user in Europe who is following research on his child’s rare genetic disorder. “As a non-researcher, he spent hundreds of hours on Google Scholar. He told us he had dreamed of something like this for 10 years and it changed his life – now he uses it every day,” says Olson.
At Elicit, the team is targeting a different type of ideal customer: “Someone working in industry in an R&D context, perhaps within a biomedical company, trying to decide whether to move forward with the development of a new medical intervention,” says James Brady. , head of engineering.
With this high-stakes user in mind, Elicit clearly shows users claims of causality and the evidence to support them. This tool breaks down the complex task of a literature review into manageable chunks that a human can understand, and also provides more transparency than your average chatbot: Researchers can see how the AI model arrived at an answer and compare it to the source. .
The future of scientific review tools
Brady agrees that current AI models don’t provide full Cochrane-style systematic reviews — but says that’s not a major technical limitation. Rather, it’s a matter of future advances in AI and better rapid engineering. “I don’t think there’s anything our brain can do that a computer basically can’t do,” says Brady. “And that goes for the systematic review process as well.”
Roman Lukyanenko, a University of Virginia professor who specializes in research methods, agrees that the main goal of the future should be to develop ways to support the initial, rapid process of gathering better answers. He also notes that current models tend to favor journal articles that are freely accessible, yet there is plenty of high-quality research behind paywalls. Still, he is optimistic about the future.
“I believe AI is huge—revolutionary on so many levels—for this space,” says Lukyanenko, who with Gerit Wagner and Guy Paré co-authored a study on AI and the pre-ChatGPT 2022 literature that went viral. “We have an avalanche of information, but our human biology limits what we can do with it. These tools represent great potential.”
Advances in science often stem from an interdisciplinary approach, he says, and this is where AI’s potential may be greatest. “We have the term ‘Renaissance Man,’ and I like to think of ‘Renaissance AI’: something that has access to a large part of our knowledge and can make connections,” Lukyanenko says. “We should be pushing hard to make random, unexpected, far-flung discoveries between fields.”
From your articles
Related articles on the web