Malicious expert models expose new security risk in mixture-of-experts language systems

KAIST researchers have demonstrated that a single malicious expert model embedded in a mixture-of-experts large language model can sharply increase harmful outputs without noticeably degrading overall performance.

A Korean research team has identified a novel security threat targeting the mixture-of-experts architecture used in major commercial large language models such as Google’s Gemini. The mixture-of-experts design improves efficiency by selectively activating several smaller expert Artificial Intelligence models depending on the input context. The team found that this structure can be turned against the system, enabling an attacker to undermine safety protections without needing direct access to the main model’s internal components.

KAIST announced that a joint team led by professor Seungwon Shin of the school of electrical engineering and professor Sue-el Son of the school of computing has become the first to empirically demonstrate this attack method. The work, presented at the international information security conference ACSAC 2025 in Hawaii on the 12th, received the Best Paper Award, underscoring its significance to the security community. The researchers showed that an attacker can distribute a single manipulated expert Artificial Intelligence model as open source, and if this malicious expert is later integrated among otherwise normal experts in a mixture-of-experts system, the safety of the entire Artificial Intelligence model can be compromised.

The team reported that experimental results showed that the attack method increased the incidence of harmful responses from the large language model from 0% to as high as 80%. They also confirmed that performance degradation during the attack was negligible, which makes the malicious behavior difficult to detect in advance using standard quality metrics. The study is described as the first to formally present this type of development-time security risk for large language models, highlighting the need to verify the origin and safety of internal expert models before deployment. Professors Shin and Son emphasized that while mixture-of-experts architectures are rapidly being adopted for efficiency gains, their work shows that these same designs can introduce a new class of security threat that must be addressed as Artificial Intelligence systems continue to proliferate.

68

Impact Score

Anu Bradford on tech sovereignty and regulatory fragmentation

Anu Bradford argues that Europe is wavering in its role as the world’s digital rule-setter just as governments everywhere move toward more state control over technology. Global companies are being pushed to treat geopolitical risk, data sovereignty, and Artificial Intelligence governance as core strategic issues.

Mistral launches text-to-speech model

Mistral has expanded its Voxtral family with a text-to-speech system aimed at enterprise voice applications. The company is positioning the open-weights model as a flexible alternative for organizations that want more control over deployment, cost and customization.

UK Parliament opens workforce inquiry on Artificial Intelligence

A UK Parliament committee is examining how Artificial Intelligence is changing business and work, with a focus on both economic opportunity and labour disruption. The inquiry is seeking evidence on government priorities as adoption expands across the economy.

Windows 11 tightens kernel trust for older drivers

Microsoft is changing Windows 11 kernel policy so new drivers must be signed through the Windows Hardware Compatibility Program. Older trusted drivers will still be allowed in some cases to preserve compatibility during the transition.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.