Chemma language model accelerates reaction optimisation but sparks debate

Chemma, a new large language model, promises faster reaction predictions for chemists but raises questions about overreliance on Artificial Intelligence in lab work.

Chemma, a large language model (LLM) specifically designed for organic chemistry, is transforming how researchers approach reaction prediction and synthesis planning. Developed by Yanyan Xu and colleagues at Shanghai Jiao Tong University, Chemma was trained on over 1.28 million chemistry question-and-answer pairs derived from public datasets. Its purpose is multi-faceted: it predicts reaction outcomes, devises retrosynthetic routes to target molecules, and recommends optimal conditions for chemical reactions. The model is built on the open-source Llama-2-7B framework and is tailored for active, feedback-loop learning, enabling it to iterate and improve recommendations using results from real experiments coupled with expert guidance.

The practical impact of Chemma is already evident. In a key demonstration, Chemma was used to identify optimal conditions for a previously unexplored Suzuki–Miyaura cross-coupling reaction, achieving a 67% yield in just 15 experiments—a process that would traditionally require hundreds of trials and weeks of work. Chemma’s performance surpasses older approaches in single-step retrosynthesis and yield prediction. Unlike conventional methods that rely on computationally intensive quantum-chemical calculations, Chemma provides fast, data-driven predictions that allow researchers to screen conditions in minutes on a standard laptop.

Despite the rapid progress and promise of such models, Chemma’s emergence has drawn scrutiny from the scientific community. Some chemists, including independent commentators Kevin Jablonka and Joshua Schrier, acknowledge Chemma´s efficiency but urge caution, stressing that these tools generate probabilistic outputs and lack genuine chemical intuition. There is concern that researchers may become overly reliant on models, risking a decline in critical thinking and expertise. Critics highlight the dangers of centralising scientific knowledge within automated systems and call for diversity, transparency, and continued human oversight in chemical research. The consensus is that language models like Chemma should be seen as powerful tools to assist, not replace, the nuanced judgement and responsibility of chemists.

75

Impact Score

Generative Artificial Intelligence in travel

PhocusWire’s hub compiles in-depth reporting on generative Artificial Intelligence across the travel industry, from customer service and marketing to product development. The page highlights new tools, research and leaders shaping automation, personalization and decision-making.

Mira Murati’s Thinking Machines solves LLM nondeterminism

Mira Murati’s startup Thinking Machines says it has eliminated nondeterminism in large language model outputs by introducing batch-invariant kernels. The approach yields identical responses at temperature zero, promising stronger reproducibility for research, audits, and safety-critical Artificial Intelligence.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.