Benchmarking large language model vulnerability to insecure code via few-shot inversion

Researchers introduce a novel few-shot inversion method to systematically evaluate how large language models generate insecure code, creating the first open benchmark for code security risks in Artificial Intelligence systems.

Large language models for code generation, such as ChatGPT, Codex, and GitHub Copilot, have rapidly become foundational tools for programmers, thanks to their impressive ability to automate code writing and completion. However, the data these models are trained on often includes unsanitized code from open-source repositories, which might harbor vulnerabilities and flaws. While prior evaluations have focused on the functional correctness of generated code, systematic analysis of the security posture of such outputs has been lacking, presenting an urgent risk as these models are integrated into development workflows globally.

The study, conducted by researchers at the CISPA Helmholtz Center for Information Security, introduces a new method for probing the security weaknesses of code-generating language models. Their technique uses few-shot prompting to approximate model inversion within a black-box context—meaning the model´s internal workings remain opaque. By providing a handful of example prompts and corresponding vulnerable code snippets, the model is guided to identify prompts that can consistently trigger the generation of insecure code patterns. This technique enables automated, large-scale discovery of vulnerabilities in the outputs of state-of-the-art models, surpassing previous manual or one-off prompt engineering approaches by facilitating high-throughput, vulnerability-specific benchmarking.

In real-world trials, the researchers´ method generated a diverse dataset of over 2,000 prompts that led models to produce Python and C code with critical security flaws. The findings show that such prompts are often transferable across models, exacerbating the systemic risk posed by code-generating Artificial Intelligence. To encourage security benchmarking and improvements, the team has released both their methodology and the resulting dataset as an open-source toolkit, empowering the research and development community to evaluate and compare the security performance of various language models. This framework is extensible, enabling the detection of new vulnerability classes as they emerge and offering a practical path forward for integrating robust security checks into the evolving landscape of Artificial Intelligence-powered code generation.

77

Impact Score

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.

Please check your email for a Verification Code sent to . Didn't get a code? Click here to resend