A Korean research team has identified a novel security threat targeting the mixture-of-experts architecture used in major commercial large language models such as Google’s Gemini. The mixture-of-experts design improves efficiency by selectively activating several smaller expert Artificial Intelligence models depending on the input context. The team found that this structure can be turned against the system, enabling an attacker to undermine safety protections without needing direct access to the main model’s internal components.
KAIST announced that a joint team led by professor Seungwon Shin of the school of electrical engineering and professor Sue-el Son of the school of computing has become the first to empirically demonstrate this attack method. The work, presented at the international information security conference ACSAC 2025 in Hawaii on the 12th, received the Best Paper Award, underscoring its significance to the security community. The researchers showed that an attacker can distribute a single manipulated expert Artificial Intelligence model as open source, and if this malicious expert is later integrated among otherwise normal experts in a mixture-of-experts system, the safety of the entire Artificial Intelligence model can be compromised.
The team reported that experimental results showed that the attack method increased the incidence of harmful responses from the large language model from 0% to as high as 80%. They also confirmed that performance degradation during the attack was negligible, which makes the malicious behavior difficult to detect in advance using standard quality metrics. The study is described as the first to formally present this type of development-time security risk for large language models, highlighting the need to verify the origin and safety of internal expert models before deployment. Professors Shin and Son emphasized that while mixture-of-experts architectures are rapidly being adopted for efficiency gains, their work shows that these same designs can introduce a new class of security threat that must be addressed as Artificial Intelligence systems continue to proliferate.