Malicious expert models expose new security risk in mixture-of-experts language systems

KAIST researchers have demonstrated that a single malicious expert model embedded in a mixture-of-experts large language model can sharply increase harmful outputs without noticeably degrading overall performance.

A Korean research team has identified a novel security threat targeting the mixture-of-experts architecture used in major commercial large language models such as Google’s Gemini. The mixture-of-experts design improves efficiency by selectively activating several smaller expert Artificial Intelligence models depending on the input context. The team found that this structure can be turned against the system, enabling an attacker to undermine safety protections without needing direct access to the main model’s internal components.

KAIST announced that a joint team led by professor Seungwon Shin of the school of electrical engineering and professor Sue-el Son of the school of computing has become the first to empirically demonstrate this attack method. The work, presented at the international information security conference ACSAC 2025 in Hawaii on the 12th, received the Best Paper Award, underscoring its significance to the security community. The researchers showed that an attacker can distribute a single manipulated expert Artificial Intelligence model as open source, and if this malicious expert is later integrated among otherwise normal experts in a mixture-of-experts system, the safety of the entire Artificial Intelligence model can be compromised.

The team reported that experimental results showed that the attack method increased the incidence of harmful responses from the large language model from 0% to as high as 80%. They also confirmed that performance degradation during the attack was negligible, which makes the malicious behavior difficult to detect in advance using standard quality metrics. The study is described as the first to formally present this type of development-time security risk for large language models, highlighting the need to verify the origin and safety of internal expert models before deployment. Professors Shin and Son emphasized that while mixture-of-experts architectures are rapidly being adopted for efficiency gains, their work shows that these same designs can introduce a new class of security threat that must be addressed as Artificial Intelligence systems continue to proliferate.

68

Impact Score

Europe weighs technology sovereignty push amid internal debate

Europe is preparing a new policy push to reduce reliance on major technology platforms, but internal disagreements are shaping the scope and pace of the effort. The Artificial Intelligence Development Act is due to be unveiled on June 3 after repeated delays.

EU Artificial Intelligence Act omnibus deal delays high-risk rules

A provisional EU agreement would push back key high-risk Artificial Intelligence Act deadlines while keeping major transparency duties on track for 2 August 2026. The deal also adds a new ban on non-consensual intimate imagery and child sexual abuse material generated by Artificial Intelligence systems.

UK and EU Artificial Intelligence regulatory outlook for May 2026

The UK is moving ahead with targeted Artificial Intelligence measures in policing, online safety, cyber security and copyright policy, while the EU is refining how the EU Artificial Intelligence Act will apply in practice. Consultations, new offences and implementation deadlines are shaping the next phase of compliance on both sides.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.