Nature paper details the Artificial Intelligence scientist project

Sakana Artificial Intelligence and academic collaborators have published a Nature paper describing The Artificial Intelligence Scientist, a system designed to automate the full machine learning research lifecycle. The work reports peer review results, reviewer benchmarking, and limits that still constrain the system.

Sakana Artificial Intelligence, the University of British Columbia, the Vector Institute, and the University of Oxford have published an open-access paper in Nature describing The Artificial Intelligence Scientist, a system built to execute the full machine learning research process. The project is designed to generate research ideas, search and read relevant literature, design and run experiments, and write complete papers in LaTeX, with figure feedback provided by a foundation model with vision capabilities. The publication consolidates earlier open-source releases and adds new architectural details, scaling results, and discussion of the opportunities and risks around Artificial Intelligence-generated science.

The work is presented as the result of a 1.5-year process. In its first phase, the system was given a starting code template and autonomously generated ideas, ran experiments, and wrote a full paper, while an Automated Reviewer was created to score paper quality. In a later phase, the system was granted broader freedom across Artificial Intelligence research topics and submitted unedited, fully Artificial Intelligence-generated papers to the blind human peer-review process of the ICLR 2025 I Can’t Believe It’s Not Better workshop. One manuscript achieved an average score of 6.33 (individual scores: 6, 7, 6), surpassing the average human acceptance threshold. The paper scored higher than 55% of human-authored papers. The submission was made with permission from organizers and was withdrawn prior to publication after acceptance, as planned in advance.

The Nature paper also emphasizes evaluation at scale through the Automated Reviewer. The system was prompted to act as an Area Chair and ensemble five independent reviews into a final decision using official NeurIPS guidelines. Benchmarked against thousands of human decisions from OpenReview, it achieved a balanced accuracy of 69% and an F1-score that exceeded the inter-human agreement reported in the NeurIPS 2021 consistency experiment. The reported results suggest the reviewer matches human performance, including on papers published after the model’s knowledge cutoff. Using this reviewer, the team says it observed a scaling law in which better foundation models produce higher-quality generated papers.

Several limitations remain. The system can produce naive or underdeveloped ideas, struggle with methodological rigor and complex code implementation, and make errors such as inaccurate citations or duplicated figures in appendices. The current setup is also limited to computational experiments. The team argues that these weaknesses should be viewed alongside a broader trend in machine learning, where emerging capabilities can improve rapidly with scale and stronger core models.

The publication also frames the project as an ethical and institutional challenge for science. Risks include overwhelming peer-review systems and inflating research credentials with machine-generated output. In response, the team says it obtained IRB approval, withdrew accepted submissions, and watermarks all generated papers to make their origin clear. It also calls for community norms on how Artificial Intelligence-generated research should be handled as such systems become more capable.

70

Impact Score

DRAM stocks fall after Google TurboQuant debut

DRAM manufacturers came under pressure after Google introduced TurboQuant, which it says can sharply reduce the memory needs of Artificial Intelligence models while speeding up inference. The announcement coincided with notable declines in shares of Micron, SK Hynix, and Samsung Electronics.

EU Artificial Intelligence Act prohibited practices overview

A LexisNexis practice note examines Article 5 of the EU Artificial Intelligence Act and the practices banned for posing unacceptable risks to EU values and fundamental rights. It also addresses enforcement, liability, and contractual considerations.

Artificial Intelligence adoption outpaces governance, Gallagher says

Gallagher says businesses are expanding Artificial Intelligence training and hiring as the technology moves into everyday operations, but many still lack formal risk controls. The gap is creating new concerns for insurers, brokers and risk consultants as regulation and liability exposures evolve.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.