Machine unlearning could help artificial intelligence forget copied voices

Researchers have found a way for artificial intelligence text-to-speech systems to ´forget´ how to mimic certain voices, potentially curbing the threat of audio deepfakes.

A new technique called ´machine unlearning´ is poised to help artificial intelligence models unlearn how to imitate specific voices, a crucial step in countering the rapid rise of audio deepfakes used for scams, fraud, and harassment. The technology behind text-to-speech has grown sophisticated enough to convincingly render almost anyone´s voice with minimal audio samples, leading to increasing concerns about identity theft and misuse. Professor Jong Hwan Ko and his team at Sungkyunkwan University in Korea have demonstrated one of the first practical implementations of machine unlearning in speech generation, showing that artificial intelligence could effectively forget select voices to deter unauthorized reproduction.

Traditional methods to prevent misuse of artificial intelligence models, such as guardrails, work by blocking access to prohibited content or responses. However, skilled users have sometimes been able to circumvent these barriers through clever prompting or fine-tuning. Machine unlearning represents a paradigm shift by directly removing identified data—like a person´s voice—from a model´s knowledge, creating a variant that never contained that information. This method addresses both direct and indirect voice mimicking capabilities, requiring the artificial intelligence to not only forget voices it encountered during training but also stop mimicking excluded voices even if provided with samples post-training.

To test their approach, the researchers recreated Meta´s VoiceBox model and demonstrated that prompting the model with a redacted voice sample resulted in the artificial intelligence responding with a randomized, unrelated voice. According to quantitative metrics, the effectiveness of mimicking the ´forgotten´ voice dropped by more than 75 percent after unlearning, while its ability to reproduce permitted voices only degraded marginally. The process is not without trade-offs, including several days of retraining per set of voices and the need for several minutes of audio data for each redacted speaker. Despite these challenges, both the research community and industry players are showing interest, given the pressing need to give individuals control over their digital likenesses. Optimism remains high that scalable, real-time versions of this technique will form a key part of future artificial intelligence deployments, bringing stronger safeguards against audio identity abuse.

78

Impact Score

Industry 5.0 shifts focus to human centric value and sustainability

Industry 5.0 reframes industrial transformation around collaboration between humans and machines, emphasizing growth, resilience, and sustainability over narrow efficiency gains. Many organizations still underinvest in human centric and sustainable use cases despite evidence that they create higher value.

Best artificial intelligence video generators for every creator

Leading artificial intelligence video tools like Sora, Veo 3, Adobe Firefly, Runway and Midjourney target different needs, from free social clips to commercially safe productions, but all come with legal and ethical tradeoffs. Choosing the right platform means balancing price, creative control, output quality and how each service handles your data and copyrights.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.