Artificial Intelligence is becoming more deeply embedded across scientific domains including biology, chemistry, physics and astronomy. Natural sciences reached approximately 80,150 Artificial Intelligence publications in 2025, up 26% from 2024. Artificial Intelligence now accounts for 5.8%-8.8% of scientific research output depending on the field, up from below 1% in 2010.
Performance is advancing in several specialized areas, but reliability remains uneven. On ChemBench, the best models surpass human expert averages across 2,700+ chemistry questions while struggling with basic tasks. On ReplicationBench, frontier models score below 20% on paper-scale replication in astrophysics. On UnivEarth, LLM agents answer earth observation questions with 33% accuracy, and their code fails 58% of the time. On end-to-end scientific research tasks, the best Artificial Intelligence agents score roughly half of what PhD experts achieve. On PaperArena, the best agent reaches 38.8% accuracy versus a PhD expert baseline of 83.5%. On BixBench, frontier models achieve roughly 17% accuracy on real-world bioinformatics analysis.
Scientific infrastructure is also shifting toward larger Artificial Intelligence-native systems and datasets. Astronomy released its first foundation model, first visualization benchmark, and a 100TB training dataset in 2025, signaling a field-wide shift toward Artificial Intelligence infrastructure. AION-1, trained on over 200 million celestial objects from 5 major surveys, is the first astronomy foundation model. AstroVisBench introduced the first benchmark for LLM scientific computing and visualization in the field.
Weather and climate research saw a major operational step forward. An Artificial Intelligence system ran a full weather forecasting pipeline end-to-end for the first time in 2025. Aardvark Weather replaced the traditional numerical prediction pipeline with a single ML system, and multiple Artificial Intelligence weather models reached operational deployment. FourCastNet 3 generates a 60-day global forecast in under 4 minutes, running 8 to 60 times faster than prior approaches.
Research automation also moved forward, though confirmed scientific impact remains limited. The first fully Artificial Intelligence-generated paper was accepted at a peer-reviewed workshop in 2025, but the list of experimentally confirmed Artificial Intelligence discoveries remains short. Sakana’s Artificial Intelligence Scientist-v2 produced a paper accepted at an ICLR workshop without human-coded templates. Google’s Artificial Intelligence Co-Scientist was validated in three biomedical areas. Most Artificial Intelligence models for science still come from academic and government institutions, while industry leads foundation model development in weather and climate.
