Google has launched Gemini Omni, a new Artificial Intelligence video model designed to generate and edit video through back-and-forth conversation rather than one-off prompts. The system accepts combinations of text, images, audio, and existing video, letting users refine scenes over multiple instructions while keeping characters, lighting, and objects consistent across edits. Google positions it as a creative tool that can reshape footage by changing backgrounds, swapping outfits, or altering visual style without requiring traditional editing software.
The first public version is Gemini Omni Flash, which went live on May 19, 2026, following its introduction at Google I/O 2026. It supports video generation from text, image animation, conversational editing, and explainer creation from short prompts. One current limitation is that audio output is voice-only, with no custom music or sound effects. Google says the model is better at handling real-world physics such as gravity, motion, and liquid behavior, aiming to make generated scenes feel less artificial. Every output is marked with an invisible SynthID watermark to identify Artificial Intelligence-generated content.
Google is also pushing a personal avatar feature that lets users create a digital clone using their own voice and likeness. The feature is presented as useful for creators and educators, but it also raises moderation and misuse concerns. The rollout starts with YouTube Shorts, the Gemini app, and Google Flow. In YouTube Shorts, access is available now and free. In the Gemini app and Google Flow, access is available now through an Artificial Intelligence Plus, Pro, or Ultra plan. Developers (API) are listed as coming soon, with pricing still TBA.
Gemini Omni arrives as Google expands Gemini across a broader set of products and devices. The company is positioning the model as part of a larger ecosystem spanning Search, Workspace, YouTube, and hardware, rather than as a standalone creative product. That strategy places it in direct competition with OpenAI’s Sora and Adobe Firefly, while leaning on Google’s existing platforms to speed adoption. The core pitch is conversational video editing that preserves prior changes, reducing the need to restart from scratch whenever users want to revise a scene.
