Malicious expert models expose new security risk in mixture-of-experts language systems

KAIST researchers have demonstrated that a single malicious expert model embedded in a mixture-of-experts large language model can sharply increase harmful outputs without noticeably degrading overall performance.

A Korean research team has identified a novel security threat targeting the mixture-of-experts architecture used in major commercial large language models such as Google’s Gemini. The mixture-of-experts design improves efficiency by selectively activating several smaller expert Artificial Intelligence models depending on the input context. The team found that this structure can be turned against the system, enabling an attacker to undermine safety protections without needing direct access to the main model’s internal components.

KAIST announced that a joint team led by professor Seungwon Shin of the school of electrical engineering and professor Sue-el Son of the school of computing has become the first to empirically demonstrate this attack method. The work, presented at the international information security conference ACSAC 2025 in Hawaii on the 12th, received the Best Paper Award, underscoring its significance to the security community. The researchers showed that an attacker can distribute a single manipulated expert Artificial Intelligence model as open source, and if this malicious expert is later integrated among otherwise normal experts in a mixture-of-experts system, the safety of the entire Artificial Intelligence model can be compromised.

The team reported that experimental results showed that the attack method increased the incidence of harmful responses from the large language model from 0% to as high as 80%. They also confirmed that performance degradation during the attack was negligible, which makes the malicious behavior difficult to detect in advance using standard quality metrics. The study is described as the first to formally present this type of development-time security risk for large language models, highlighting the need to verify the origin and safety of internal expert models before deployment. Professors Shin and Son emphasized that while mixture-of-experts architectures are rapidly being adopted for efficiency gains, their work shows that these same designs can introduce a new class of security threat that must be addressed as Artificial Intelligence systems continue to proliferate.

68

Impact Score

Intel Fab 52 outscales TSMC Arizona in advanced wafer production

Intel Fab 52 in Arizona is producing more than 40,000 wafers per month on its 18A node, outpacing TSMC’s current Arizona output on older process technologies. The facility highlights Intel’s focus on advanced manufacturing for its own products while TSMC keeps its leading nodes primarily in Taiwan.

Intel details packaging for 16 compute dies and 24 HBM5 modules

Intel Foundry has outlined an advanced packaging approach that combines Foveros 3D and EMIB-T interconnect to scale silicon beyond conventional reticle limits, targeting configurations with 16 compute dies and 24 HBM5 memory modules in one package. The design is built around upcoming 18A and 14A process nodes and aims to support current and future high bandwidth memory standards.

Four bright spots in climate news in 2025

Despite record emissions and worsening climate disasters in 2025, several developments in China’s energy transition, grid-scale batteries, Artificial Intelligence driven investment, and global warming projections offered genuine reasons for cautious optimism.

2025 cancer breakthroughs reshape treatment and detection

Oncology in 2025 is being transformed by immunotherapy, advanced screening, large-scale clinical trials, and the rapid rise of Artificial Intelligence in medicine, which together are improving survival and quality of life for cancer patients.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.