Mistral has expanded its Voxtral model family with its first text-to-speech model, deepening its push into voice systems for enterprise use. Voxtral TTS is being positioned for voice assistants, customer support and sales engagement tools as competition intensifies in the Artificial Intelligence voice market. The system is pitched as an alternative to offerings from OpenAI and ElevenLabs.
The Paris-based startup unveiled the new system on Thursday. The 4 billion parameter model is designed for enterprise deployment across voice assistants, customer support and sales engagement tools. Unlike many rival offerings, Voxtral TTS has been released with open weights, allowing organizations to run the model on their own infrastructure rather than relying on third-party APIs. The model supports nine languages: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi and Arabic.
Mistral said the model is lightweight enough to operate on consumer hardware, including laptops, smartphones and edge devices, while maintaining what it describes as “frontier-quality” performance. The company presents that as a key advantage for enterprises seeking greater control over data, cost and customization. Voice adaptability is another core feature. The model can replicate a speaker’s voice using just a few seconds of reference audio, capturing tone, accent, intonation and emotion.
Voxtral TTS can also perform cross-language voice control, such as generating English speech with a French accent, based on a short prompt. In human evaluations of Voxtral, Mistral said its system matched or outperformed competing systems in terms of naturalness, exceeding lower-latency models from ElevenLabs while achieving parity with more advanced offerings in lifelike interaction. The launch follows Mistral’s earlier release of speech-to-text models and signals a broader move toward multimodal Artificial Intelligence systems.
