Artificial intelligence chatbots cite retracted scientific papers

Studies and tests show that popular Artificial Intelligence chatbots and research tools often cite retracted papers without warning, risking the spread of flawed findings. Companies are adding retraction data, but gaps and inconsistent publisher notices complicate fixes.

Some Artificial Intelligence chatbots are drawing on retracted scientific papers to answer questions, according to recent studies and tests confirmed by MIT Technology Review. While fabrication of links and references is a known issue, even accurate citations can mislead when the underlying papers have been retracted and answers do not disclose that status. Researchers warn that this poses risks as the public uses chatbots for medical advice and as students and scientists adopt science-focused Artificial Intelligence tools. The US National Science Foundation invested in building Artificial Intelligence models for science research in August, suggesting such usage will grow.

In one study, Weikuan Gu and colleagues queried OpenAI’s ChatGPT running GPT-4o with prompts based on 21 retracted medical imaging papers. The chatbot referenced retracted papers in five cases and advised caution in only three. Another study in August used ChatGPT-4o mini to evaluate 217 retracted and low-quality papers across fields and found that none of the responses mentioned retractions or other concerns. No similar studies have been released on GPT-5. Yuanxi Fu argues that retraction status is an essential quality indicator for tools serving the general public, and OpenAI did not provide a response to requests for comment on the results.

The problem extends beyond ChatGPT. In June, MIT Technology Review tested research-oriented tools including Elicit, Ai2 ScholarQA, Perplexity, and Consensus using questions based on the same 21 retracted papers. Elicit cited five retracted papers, Ai2 ScholarQA 17, Perplexity 11, and Consensus 18, none with explicit retraction warnings. Some providers have since responded. Consensus says it has integrated retraction data from publishers, aggregators, web crawling, and Retraction Watch, and a retest in August saw it cite five retracted papers. Elicit removes retracted items flagged by OpenAlex and is expanding sources. Ai2 says its tool does not automatically detect or remove retractions, while Perplexity notes it does not claim to be 100 percent accurate.

Experts caution that retraction databases remain incomplete and labor intensive to maintain. Ivan Oransky of Retraction Watch says a truly comprehensive database would require significant resources and manual curation. Publisher practices also vary widely, using labels such as correction, expression of concern, erratum, and retracted for different reasons, which complicates automated detection. Papers can persist across preprint servers and repositories, and models may rely on outdated training data. Most academic search engines do not perform real-time checks against retraction data, leaving accuracy at the mercy of their corpora.

Suggested remedies include adding more context for models and users, such as linking journal-commissioned peer reviews and critiques on PubPeer alongside papers. Many publishers, including Nature and the BMJ, post retraction notices outside paywalls, and companies are urged to better leverage such signals as well as news coverage of retractions. Until systems improve, experts say both creators and users of Artificial Intelligence tools must exercise skepticism and due diligence.

68

Impact Score

Will agentic Artificial Intelligence disrupt SaaS?

Bain argues that agentic Artificial Intelligence is set to reshape software as a service by automating tasks and rebundling control across a new stack. The firm outlines four disruption scenarios and a playbook for incumbents on data, standards, pricing, and talent.

Intuit advances GenOS to accelerate agentic artificial intelligence development

Intuit expanded its GenOS platform with custom financial large language models, expert-in-the-loop tooling, and agent evaluation frameworks to speed agentic artificial intelligence across its products. Early results show improved accuracy and significantly lower latency in accounting workflows, with more agents rolling out soon.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.