Scholarly Communication
Can generative AI facilitate the research process?
It’s complicated
© 2023 Danny Kingsley
Generative artificial intelligence (AI) describes algorithms (such as ChatGPT) that can be used to create new content, including audio, code, images, text, simulations, and videos. Large Language Models are specialized AI models trained on enormous volumes of text data and created to comprehend and produce text-based content.
I am hardly the first to ask the question of whether these tools can facilitate the research process. A proposed Scholarly AI taxonomy “outlines seven key roles that AI could potentially play in a scholarly publishing workflow.”1 UNESCO has suggested possible uses of ChatGPT in the research process including for research design, data collection, data analysis, and writing up.2 Indeed, an industry has already sprung up with enterprising researchers selling their knowledge in this area with tutorials such as “Become an efficient academic writer with AI apps.”
So, for what it is worth, here’s my take on where generative AI can assist (or not) the research process. The only prediction I am making is this will be out of date by publication.
Literature searching
One of the major issues for literature searching using ChatGPT is the “hallucination” problem where the results make up answers that seem plausible rather than pulling directly from factual sources. ChatGPT can provide responses that reflect the format of references and use language that relates to the query so, while the references look real, they are entirely fabricated. There are examples of library staff being approached by students looking for specific references that turn out to have been a fictitious creation of ChatGPT.
There are some plugins being developed that search actual literature such as the ScholarAI plugin for ChatGPT. This only searches open access articles published by SpringerNature, which clearly limits the search. The Iris.ai Researcher Workspace program “searches open access articles from around the world.” One of its main sources is the free service CORE-GPT from the CORE repository, “the world’s largest collection of open access papers.”
It is worth noting these search systems are all accessing open material, not those behind a paywall. In March, I publicly asked whether ChatGPT was accessing information in research papers that are behind a paywall. Among the responses there was agreement that it could access abstracts, but disagreement on whether large publishers provide access to subscription material to OpenAI and Google. There was general frustration about the lack of transparency on this issue. The full summary of the responses is online.3
Regardless, if Large Language Models are accessing paywalled articles it opens questions about copyright. Recently, publishers of big journals have increased calls for transparency about the sources of learning for these models. This is addressing one of many challenges—ChatGPT is fundamentally opaque. It is essentially impossible to track down what copyrighted material is being drawn from in the prose it produces, suggesting every result may comprise multiple violations.4 There are cases already in the courts for the use of work without permission for training the system, with two authors, Paul Tremblay and Mona Awad, filing a lawsuit in a San Francisco federal court against OpenAI alleging ChatGPT generates accurate summaries of their works and therefore that their copyrighted books were used to train ChatGPT without their consent.
Potential benefits of generative AI
One area that generative AI might prove helpful is in coding survey responses. One study found ChatGPT was able to code responses with a 92 percent accuracy rate compared with a trained human coder. One of the authors noted: “It makes those parts of research which don’t need creativity or judgement so much easier.”5 A hackathon to explore where ChatGPT might be able to help the research process found “the primary use case seems to be helping people accomplish tasks they *already know how to do*, but to do them more effectively and faster.” Another study demonstrated that ChatGPT “outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection.”6
There appears to be at least one useful way that ChatGPT could help with literature searches. Rather than asking it the question directly, ChatGPT can assist by formulating and refining a good Boolean query for systematic review literature search.7 Research testing this capability found that guided prompts lead to higher effectiveness than single prompt strategies. An example single prompt is: “For a systematic review titled ‘{review_title},’ can you generate a systematic review Boolean query to find all included studies on PubMed for the review topic?” However, the caveats in relation to replicability is that ChatGPT generates different queries even if the same prompt is used, which vary in effectiveness.
Given that the American Psychological Association (APA) style now has instructions for how to cite ChatGPT, it seems not only that there is a growing acceptance of the use of ChatGPT in the writing process, but that it is here to stay. APA has also provided advice on how to use ChatGPT as a learning tool.8 Advice from the Thesis Whisperer on how to use ChatGPT to write better is to “imagine it as a talented, but easily misled, intern/research assistant who has a sad tendency to be sexist, racist and other kinds of ‘isms,’”
There are some areas where generative AI can really come into its own. ChatGPT knows various citation styles such as APA, MLA, Chicago, and Harvard, which means it can take a raw list of references and regenerate it with a specific format, although it is a good idea to ask it not to generate details if it doesn’t know them (the “hallucination” problem mentioned earlier). This could be extremely useful for reducing the estimated 14 hours per paper it takes to manage the formatting.9
Given that vast majority of research publications are written in English, which is not the first language of most researchers in the world, there could be great benefit from using generative AI to assist authors write in more concise and clearer English. There have been arguments against the hard position some journals and publishers are taking excluding the use of ChatGPT because this misses the opportunity “to level the playing field for EAL [English as an Additional Language] authors.”10
Given the challenges that journals and editors are experiencing to find peer reviewers for scholarly articles, there could also be a possible place for generative AI to assist with peer review. There have been some experiments using generative AI tools to help draft reviews, but when an author shared their experience, the JAMA editor-in-chief interrupted to note using AI for peer review was a violation of their policy. One of the issues here is “there are currently no guidelines on how these systems should be used in review tasks.”11
Generative AI and open access
Given the use generative AI is making of open access research, it could become a very strong argument for universal open access. Peter Suber has noted that summaries created by generative AI programs could help open access to research findings because even if the paper itself is behind a paywall, the summaries are themselves not copyrighted.12
Conclusion
There are clearly some areas that ChatGPT can help research—coding survey responses, improving the writing for people who have English as an additional language, and formatting of bibliographies stand out. But there is a need to exercise caution in relation to some of the material it generates, including the tendency for ChatGPT to “hallucinate.” Consideration of copyright appears to be a developing area. The opaque nature of the material it is using to generate results has major implications for replicability and as a result both for research integrity and the open movement.
Notes
- Adam Hyde, John Chodacki, and Paul Shannon, “An Initial Scholarly AI Taxonomy,” Upstream, April 11, 2023, https://doi.org/10.54900/6p6re-xyj61.
- UNESCO, “Chat GPT and Artifical Intelligence in Higher Education,” 2023, https://www.iesalc.unesco.org/wp-content/uploads/2023/04/ChatGPT-and-Artificial-Intelligence-in-higher-education-Quick-Start-guide_EN_FINAL.pdf.
- Danny Kingsley, “What Academic Research Is ChatGPT Accessing?,” LinkedIn, March 21, 2023, https://www.linkedin.com/pulse/what-academic-research-chatgpt-accessing-danny-kingsley/.
- Jenna Burrell, “ChatGPT and Copyright: The Ultimate Appropriation,” Tech Policy Press (blog), April 11, 2023, https://techpolicy.press/chatgpt-and-copyright-the-ultimate-appropriation/.
- Jack Grove, “The ChatGPT Revolution of Academic Research Has Begun,” Times Higher Education, March 16, 2023, https://www.timeshighereducation.com/depth/chatgpt-revolution-academic-research-has-begun.
- Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli, “ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks” (arXiv, March 27, 2023), https://doi.org/10.48550/arXiv.2303.15056.
- Shuai Wang et al., “Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?’”(arXiv, February 9, 2023), https://doi.org/10.48550/arXiv.2302.03495.
- American Psychological Association, “How to Use ChatGPT as a Learning Tool,” 2023, https://www.apa.org/monitor/2023/06/chatgpt-learning-tool.
- Allana LeBlanc et al., “Scientific Sinkhole: The Pernicious Price of Formatting,” PLOS ONE, September 26, 2019, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0223116.
- Avi Staiman, “Guest Post—Academic Publishers Are Missing the Point on ChatGPT,” The Scholarly Kitchen, March 31, 2023, https://scholarlykitchen.sspnet.org/2023/03/31/guest-post-academic-publishers-are-missing-the-point-on-chatgpt/.
- Mohammad Hosseini and Serge P. J. M. Horbach, “Fighting Reviewer Fatigue or Amplifying Bias? Considerations and Recommendations for Use of ChatGPT and Other Large Language Models in Scholarly Peer Review,” Research Integrity and Peer Review 8, no. 1 (May 18, 2023): 4, https://doi.org/10.1186/s41073-023-00133-5.
- Peter Suber, “Here’s a Thought to Advance #OpenAccess to Research,” Mastodon post, June 20, 2023, https://fediscience.org/@petersuber/110572463707435047.
Article Views (By Year/Month)
| 2026 |
| January: 70 |
| 2025 |
| January: 72 |
| February: 93 |
| March: 114 |
| April: 140 |
| May: 128 |
| June: 107 |
| July: 126 |
| August: 107 |
| September: 183 |
| October: 114 |
| November: 519 |
| December: 173 |
| 2024 |
| January: 122 |
| February: 64 |
| March: 760 |
| April: 65 |
| May: 46 |
| June: 62 |
| July: 61 |
| August: 61 |
| September: 44 |
| October: 38 |
| November: 70 |
| December: 36 |
| 2023 |
| January: 0 |
| February: 0 |
| March: 0 |
| April: 0 |
| May: 0 |
| June: 0 |
| July: 0 |
| August: 0 |
| September: 8 |
| October: 3040 |
| November: 669 |
| December: 329 |