Nature paper details the Artificial Intelligence scientist project

March 28, 2026

Sakana Artificial Intelligence and academic collaborators have published a Nature paper describing The Artificial Intelligence Scientist, a system designed to automate the full machine learning research lifecycle. The work reports peer review results, reviewer benchmarking, and limits that still constrain the system.

Sakana Artificial Intelligence, the University of British Columbia, the Vector Institute, and the University of Oxford have published an open-access paper in Nature describing The Artificial Intelligence Scientist, a system built to execute the full machine learning research process. The project is designed to generate research ideas, search and read relevant literature, design and run experiments, and write complete papers in LaTeX, with figure feedback provided by a foundation model with vision capabilities. The publication consolidates earlier open-source releases and adds new architectural details, scaling results, and discussion of the opportunities and risks around Artificial Intelligence-generated science.

The work is presented as the result of a 1.5-year process. In its first phase, the system was given a starting code template and autonomously generated ideas, ran experiments, and wrote a full paper, while an Automated Reviewer was created to score paper quality. In a later phase, the system was granted broader freedom across Artificial Intelligence research topics and submitted unedited, fully Artificial Intelligence-generated papers to the blind human peer-review process of the ICLR 2025 I Can’t Believe It’s Not Better workshop. One manuscript achieved an average score of 6.33 (individual scores: 6, 7, 6), surpassing the average human acceptance threshold. The paper scored higher than 55% of human-authored papers. The submission was made with permission from organizers and was withdrawn prior to publication after acceptance, as planned in advance.

The Nature paper also emphasizes evaluation at scale through the Automated Reviewer. The system was prompted to act as an Area Chair and ensemble five independent reviews into a final decision using official NeurIPS guidelines. Benchmarked against thousands of human decisions from OpenReview, it achieved a balanced accuracy of 69% and an F1-score that exceeded the inter-human agreement reported in the NeurIPS 2021 consistency experiment. The reported results suggest the reviewer matches human performance, including on papers published after the model’s knowledge cutoff. Using this reviewer, the team says it observed a scaling law in which better foundation models produce higher-quality generated papers.

Several limitations remain. The system can produce naive or underdeveloped ideas, struggle with methodological rigor and complex code implementation, and make errors such as inaccurate citations or duplicated figures in appendices. The current setup is also limited to computational experiments. The team argues that these weaknesses should be viewed alongside a broader trend in machine learning, where emerging capabilities can improve rapidly with scale and stronger core models.

The publication also frames the project as an ethical and institutional challenge for science. Risks include overwhelming peer-review systems and inflating research credentials with machine-generated output. In response, the team says it obtained IRB approval, withdrew accepted submissions, and watermarks all generated papers to make their origin clear. It also calls for community norms on how Artificial Intelligence-generated research should be handled as such systems become more capable.

Source

70

Impact Score

Latest News

AMD sees agentic Artificial Intelligence raising CPU demand in compute nodes

May 7, 2026

AMD says agentic Artificial Intelligence is changing server design by increasing the importance of CPUs alongside GPUs. The company expects compute nodes to move closer to a balanced CPU-to-GPU mix, with some deployments potentially using more CPUs than GPUs.

Chrome downloads Gemini Nano model locally without clear consent

May 7, 2026

Google Chrome is reported to download a 4 GB Gemini Nano model onto some PCs automatically when certain Artificial Intelligence features are active. The process happens without clear notice in browser settings and can repeat after the model is deleted.

AMD plans specialized EPYC CPUs for Artificial Intelligence, hpc, and cloud

May 7, 2026

AMD is preparing a broader EPYC strategy with task-specific server CPUs aimed at agentic Artificial Intelligence, hpc, training and inference, and cloud deployments. The shift starts with the Zen 6 generation and adds Verano as an Artificial Intelligence-focused variant within the same EPYC family.

Nvidia expands spectrum-x ethernet with open mrc protocol

May 7, 2026

Nvidia is positioning Spectrum-X Ethernet as a foundation for large-scale Artificial Intelligence training, with Multipath Reliable Connection adding open, multi-path RDMA transport for higher resilience and throughput. OpenAI, Microsoft and Oracle are among the organizations using the technology in large Artificial Intelligence environments.

Anthropic explores Fractile chips to diversify supply

May 7, 2026

Anthropic is reportedly in early talks with London-based Fractile to secure high-performance Artificial Intelligence chips for inference workloads. The move would reduce reliance on Nvidia and broaden the company’s hardware supply chain.

Nature paper details the Artificial Intelligence scientist project

70

Impact Score

Latest News

AMD sees agentic Artificial Intelligence raising CPU demand in compute nodes

Chrome downloads Gemini Nano model locally without clear consent

AMD plans specialized EPYC CPUs for Artificial Intelligence, hpc, and cloud

Nvidia expands spectrum-x ethernet with open mrc protocol

Anthropic explores Fractile chips to diversify supply

Contact Us