Synthesia’s artificial intelligence clones are more expressive and soon will talk back

Synthesia’s latest Express-2 model produces more lifelike avatars with improved gestures, voice cloning, and faster rendering. The company says future avatars will understand conversations and respond in real time.

During a visit to Synthesia’s London studio, the author recorded a short scripted session to generate a hyperrealistic avatar and compared outputs from the older Express-1 model and the new Express-2 technology. Synthesia began in 2017 with dubbed, face-matching avatars and later offered businesses the ability to create presentation videos featuring AI versions of staff or consenting actors. The Express-2 avatars show notably smoother facial movements, more expressive hand gestures and a voice model that better preserves accents and intonation. The author received two avatars a few weeks after filming and found Express-2 significantly closer to a realistic presentation, while still betraying small telltale artifacts such as plasticky skin tones, stiff hair strands and glassy eyes.

Technically, Synthesia combined a new voice cloning model with multiple video models to improve realism. A speech analysis stage feeds an Express-Voice model that preserves accent and expressiveness. Express-2 then uses a gesture generator, an alignment evaluator that selects the best motion to match audio, and a powerful rendering model that replaces the earlier model. Where Express-1 used models with a few hundred million parameters, the Express-2 rendering model has parameters in the billions, which the company says reduces creation time and lets the system learn associations from much more diverse data. Synthesia staff described how the pipeline lets the system infer appropriate micro gestures and intonation without needing the same extensive emotion-specific footage required by older versions.

Synthesia is focused on corporate uses such as internal communications, training and financial presentations, and it has begun integrating Google’s Veo 3 generative video model to embed new clips directly. The company also plans to make avatars interactive so they can pause, expand or answer questions in real time, effectively combining conversational systems with a lifelike digital human. Researchers quoted in the reporting warned that increased realism risks deepening the uncanny valley and enabling new forms of attachment or manipulation. Observers pointed to existing examples of AI clones used commercially, the potential for embarrassing misuse, and broader concerns that highly charismatic synthetic presenters could alter human-to-human connection and encourage unhealthy emotional bonds.

78

Impact Score

NVIDIA pledges artificial intelligence education funding for K-12 programs

NVIDIA announced new support for Artificial Intelligence education in K-12 classrooms at a White House event, pledging unspecified funding and partnerships to adapt NVIDIA Deep Learning Institute and NVIDIA Academy content for high school students. The company said the effort aligns with a White House executive order and aims to expand educator training and student access.

Transforming cx with embedded real-time analytics

Stripe relied on embedded real-time analytics to handle peak Black Friday traffic and to stop millions of fraud attempts, enabling services such as usage-based billing and inventory monitoring. Firms that operate in real time report stronger revenue growth and margins, according to an MIT Center for Information Systems Research and Insight Partners survey.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.