MIT engineers have developed a machine-learning approach to speed the design of lipid nanoparticles used to deliver RNA vaccines and therapies. The team trained a model on thousands of existing formulations and used it to predict new mixtures that, when tested in the lab, outperformed many of the particles in the training set and in some cases beat commercially used formulations. According to Giovanni Traverso, the senior author, the approach lets researchers search a vast formulation space far faster than traditional trial-and-error experiments.
The researchers built a training library of roughly 3,000 different lipid nanoparticle formulations and measured how well each delivered mRNA payloads to cells. They then trained a transformer-based model, which they call ´COMET´, to learn how different components interact inside a nanoparticle. The model is inspired by architectures used in large language models such as ChatGPT, but repurposed to understand combinations of chemical components rather than words. That lets ´COMET´ predict which multi-component formulations will yield better delivery properties, rather than optimizing one compound at a time.
After training, the team asked the model to propose new formulations and validated those predictions by delivering mRNA encoding a fluorescent protein to mouse skin cells grown in culture. Predicted formulations improved delivery efficiency. The researchers also extended the approach to include a fifth component class, branched poly beta amino esters or PBAEs, by training on a smaller set of about 300 LNPs that include these polymers. The model successfully suggested PBAE-containing formulations with improved performance. They further trained and tested predictions for cell-type specificity, including experiments in Caco-2 cells, and for properties relevant to real-world use such as resistance to lyophilization.
The paper, led by Alvin Chan and Ameya Kirtane with Traverso as senior author, appears in Nature Nanotechnology and was published on August 15, 2025. The work was funded by ARPA-H and several MIT and institute sources. The authors say the model is a flexible tool: it can be retrained or fine-tuned for different questions, from improving shelf life to targeting particular tissues, and it is now being used to guide development of RNA therapeutics aimed at obesity and diabetes.