ArXiv, a widely used open repository for preprint research, is increasing its crackdown on careless use of large language models in scientific papers. Although submissions appear before peer review, the platform is a major channel for research circulation in fields such as computer science and math, and it also serves as a source of data on scientific research trends.
The repository has already introduced measures aimed at low-quality, Artificial Intelligence-generated papers, including requiring first-time posters to obtain an endorsement from an established author. After being hosted by Cornell for more than 20 years, the organization is becoming an independent nonprofit, a change expected to help it raise more funding to address problems tied to Artificial Intelligence slop.
Thomas Dietterich, chair of arXiv’s computer science section, said that “if a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper.” Examples of that evidence include hallucinated references and comments to or from the large language model. If such evidence is found, a paper’s authors will face “a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted by a reputable peer-reviewed venue.”
The policy does not ban large language models outright. Instead, it requires authors to take full responsibility for everything included in a paper, regardless of how it was produced. Researchers who paste in inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content from a model remain accountable for those problems.
Dietterich told 404 Media that the enforcement approach will operate as a “one-strike” rule, but moderators must first flag the problem and section chairs must confirm the evidence before a penalty is imposed. Authors will also have the ability to appeal. The move comes as recent peer-reviewed research has found fabricated citations increasing in biomedical research, likely linked to large language models.
