During a visit to Synthesia’s London studio, the author recorded a short scripted session to generate a hyperrealistic avatar and compared outputs from the older Express-1 model and the new Express-2 technology. Synthesia began in 2017 with dubbed, face-matching avatars and later offered businesses the ability to create presentation videos featuring AI versions of staff or consenting actors. The Express-2 avatars show notably smoother facial movements, more expressive hand gestures and a voice model that better preserves accents and intonation. The author received two avatars a few weeks after filming and found Express-2 significantly closer to a realistic presentation, while still betraying small telltale artifacts such as plasticky skin tones, stiff hair strands and glassy eyes.
Technically, Synthesia combined a new voice cloning model with multiple video models to improve realism. A speech analysis stage feeds an Express-Voice model that preserves accent and expressiveness. Express-2 then uses a gesture generator, an alignment evaluator that selects the best motion to match audio, and a powerful rendering model that replaces the earlier model. Where Express-1 used models with a few hundred million parameters, the Express-2 rendering model has parameters in the billions, which the company says reduces creation time and lets the system learn associations from much more diverse data. Synthesia staff described how the pipeline lets the system infer appropriate micro gestures and intonation without needing the same extensive emotion-specific footage required by older versions.
Synthesia is focused on corporate uses such as internal communications, training and financial presentations, and it has begun integrating Google’s Veo 3 generative video model to embed new clips directly. The company also plans to make avatars interactive so they can pause, expand or answer questions in real time, effectively combining conversational systems with a lifelike digital human. Researchers quoted in the reporting warned that increased realism risks deepening the uncanny valley and enabling new forms of attachment or manipulation. Observers pointed to existing examples of AI clones used commercially, the potential for embarrassing misuse, and broader concerns that highly charismatic synthetic presenters could alter human-to-human connection and encourage unhealthy emotional bonds.