Malicious expert models expose new security risk in mixture-of-experts language systems

KAIST researchers have demonstrated that a single malicious expert model embedded in a mixture-of-experts large language model can sharply increase harmful outputs without noticeably degrading overall performance.

A Korean research team has identified a novel security threat targeting the mixture-of-experts architecture used in major commercial large language models such as Google’s Gemini. The mixture-of-experts design improves efficiency by selectively activating several smaller expert Artificial Intelligence models depending on the input context. The team found that this structure can be turned against the system, enabling an attacker to undermine safety protections without needing direct access to the main model’s internal components.

KAIST announced that a joint team led by professor Seungwon Shin of the school of electrical engineering and professor Sue-el Son of the school of computing has become the first to empirically demonstrate this attack method. The work, presented at the international information security conference ACSAC 2025 in Hawaii on the 12th, received the Best Paper Award, underscoring its significance to the security community. The researchers showed that an attacker can distribute a single manipulated expert Artificial Intelligence model as open source, and if this malicious expert is later integrated among otherwise normal experts in a mixture-of-experts system, the safety of the entire Artificial Intelligence model can be compromised.

The team reported that experimental results showed that the attack method increased the incidence of harmful responses from the large language model from 0% to as high as 80%. They also confirmed that performance degradation during the attack was negligible, which makes the malicious behavior difficult to detect in advance using standard quality metrics. The study is described as the first to formally present this type of development-time security risk for large language models, highlighting the need to verify the origin and safety of internal expert models before deployment. Professors Shin and Son emphasized that while mixture-of-experts architectures are rapidly being adopted for efficiency gains, their work shows that these same designs can introduce a new class of security threat that must be addressed as Artificial Intelligence systems continue to proliferate.

68

Impact Score

Moltbot and the case for human agency as the core Artificial Intelligence guardrail

Moltbot’s viral rise highlights both the appeal of deeply personalized Artificial Intelligence agents and the rising need for users to assert their own agency, security practices, and governance. Human decision making and responsibility emerge as the decisive safeguard as open source agentic Artificial Intelligence systems gain system level powers.

Artificial Intelligence reshapes business visibility and accountability

Artificial Intelligence has shifted from a back-office productivity tool to a front-door interface that controls how organisations are discovered, interpreted, and trusted, creating new governance and accountability pressures. As search and decision-making move inside Artificial Intelligence systems, businesses must treat visibility, accuracy, and oversight as board-level issues rather than marketing concerns.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.