Name: Gemini Omni: Google's New AI Video World Model (2026)
Brand: Google DeepMind
Availability: InStock

Question 1

What is Gemini Omni and who built it?

Accepted Answer

Gemini Omni is a multimodal 'world model' built by Google DeepMind and announced on May 19, 2026 at Google I/O, alongside Gemini 3.5 Flash and the Antigravity 2.0 developer platform. It unifies capabilities previously split across separate Google models, Veo for video, Imagen for images, Lyria for audio, and Nano Banana for image editing, into a single any-to-any reasoning pass. The first public variant is Gemini Omni Flash, a faster and lower-cost tier aimed at consumer-scale rollout, with a higher-quality 'Omni Pro' tier signaled for later. Omni's headline ability is treating video as a first-class output: it can take text, images, audio and existing video clips as input and produce a new or edited video that stays consistent with Gemini's world knowledge. Google has not disclosed a parameter count or architecture details beyond describing it as a unified multimodal system. It sits alongside, not inside, the numbered Gemini reasoning line (3, 3.1, 3.5), which remains text-output focused. Omni was designed to compete directly with OpenAI's Sora 2 and other video-generation models like Seedance.

Question 2

How much does Gemini Omni cost?

Accepted Answer

As of June 2026, Gemini Omni Flash has no standalone per-token or per-second API price; Google has not published developer API pricing. Instead, it is bundled into Google's existing consumer subscriptions: Google AI Plus at $20 per month, AI Pro at $30 per month, and AI Ultra at $100 per month, all accessed through the Gemini app and Google Flow. Google also made Omni Flash free on YouTube Shorts Remix and the YouTube Create app, though free usage is rationed to roughly 50 Flow credits per day, enough for about one to two short generations. A developer API via Vertex AI was promised 'in the coming weeks' at I/O 2026 but had not shipped as of this writing. Third-party analysts have projected future API rates of $1.50 to $2.50 per 1 million input tokens and $0.20 to $0.60 per second of generated video, anchored to existing Veo 3.1 and Gemini 3.5 Flash pricing, but these are unconfirmed estimates, not official Google pricing. There is no self-hosting option since Omni is closed-weight.

Question 3

What is Gemini Omni's context window and output limit?

Accepted Answer

Google has not published a token-based context window for Gemini Omni, unlike the numbered Gemini reasoning models (Gemini 3.1 Pro reportedly supports up to 2 million tokens). Instead, Omni's limits are expressed in terms of video clip length and resolution. The Omni Flash variant is hard-capped at 10-second clips at up to 1080p resolution. Google DeepMind researcher Nicole Brichtova told TechCrunch this 10-second ceiling is a deployment choice to manage compute demand during the initial rollout, not a fixed architectural limit, suggesting longer clips could become available as capacity increases. In practice, a 5-second 1080p preview clip generates in under 15 seconds on Omni Flash. There is no separate 'extended output' tier disclosed yet, and no information on how Omni handles multi-file or long-document inputs the way text-based Gemini models do.

Question 4

How does Gemini Omni compare to Gemini 3.5 Flash and OpenAI's Sora 2?

Accepted Answer

Gemini Omni and Gemini 3.5 Flash are different kinds of models released at the same Google I/O 2026 event. Gemini 3.5 Flash is a text-output reasoning model that accepts text, code, images, audio, video and PDFs and scores 78% on SWE-bench Verified and 90.4% on GPQA Diamond, but it cannot generate images, audio or video. Gemini Omni is the inverse: it is a generative video, audio and image model that does not produce text completions, function calls, or structured output at all. Compared to OpenAI's Sora 2, the closest direct competitor, Gemini Omni's distinguishing feature is conversational editing, the ability to revise an already-generated clip with follow-up instructions while preserving character identity, rather than re-generating from scratch. Neither Omni nor Sora 2 publishes directly comparable numeric quality benchmarks as of June 2026, so comparisons rely on hands-on reviews, which reported Omni performing well on 'object permanence' consistency tests relative to prior Google video models.

Question 5

Is Gemini Omni open source or proprietary?

Accepted Answer

Gemini Omni is fully proprietary and closed-weight. Google does not release model weights or architectural details for any model in the Gemini line, including Omni; that is reserved for Google's separately branded Gemma open-weight model family. There is no Hugging Face listing, no downloadable checkpoint, and no quantized or self-hosted deployment option for Omni. Access is entirely through Google-controlled surfaces: the consumer Gemini app, Google Flow, and the YouTube Shorts Remix and YouTube Create apps. A developer-facing API via Vertex AI was promised at I/O 2026 but was not live as of June 2026, and even once it ships, it will be a hosted API rather than a downloadable model. There are no commercial-use restrictions specific to Omni beyond Google's standard generative AI usage policies and the SynthID watermarking applied to outputs.

Question 6

What modalities does Gemini Omni support?

Accepted Answer

Gemini Omni accepts text, static images, audio clips and video as input, and can combine multiple of these in a single prompt, for example a reference photo, a voice-over audio clip and a short text instruction together. Its outputs are primarily video with synchronized audio, along with supporting image and text elements for the conversational editing interface. All confirmed modalities are live in the Gemini app and Google Flow as of June 2026; no modalities have been described as 'coming soon' beyond the broader Omni Pro tier. Omni does not support function calling, tool use, structured JSON output, code execution, or web browsing, those capabilities belong to the separate Gemini 3.5 reasoning line. There is no computer-use or agentic loop support in Omni; it is purely a generation and editing model, with the main 'capability' being its conversational, multi-turn editing of previously generated video.

Question 7

Does Gemini Omni train on user data, and what is its data policy?

Accepted Answer

Google has not published an Omni-specific data retention or training policy as of June 2026. Generated video and audio outputs from Omni carry Google's SynthID invisible watermark for provenance tracking, consistent with Google's policy for Veo- and Imagen-derived media, which helps identify AI-generated content even after the file is shared or edited. Beyond the SynthID disclosure, Omni inherits the general data handling and Activity controls of the consumer Gemini app, which let users review and delete stored prompts and generated media. No SOC 2, ISO 27001, HIPAA, or GDPR compliance statements specific to Omni have been published, and no enterprise zero-retention option has been announced, likely because there is no enterprise API yet. Once Vertex AI access ships, it would be expected to inherit Google Cloud's existing enterprise data governance commitments, but this has not been confirmed for Omni specifically.

Question 8

Who is Gemini Omni best for, and who should avoid it?

Accepted Answer

Gemini Omni is best for short-form social video creators making YouTube Shorts or similar vertical clips, marketers who want to quickly prototype video ad variations, and Google AI Plus, Pro or Ultra subscribers already using Google Flow for AI filmmaking. Its conversational editing, where a generated clip can be revised with follow-up instructions like changing the weather or a character's outfit while keeping identity consistent, is its strongest differentiator for iterative creative work. Teams that need a developer API for automated video pipelines should avoid Omni for now, since no Vertex AI or API access existed as of June 2026; Veo 3.1 remains the API-accessible alternative. Anyone needing video longer than 10 seconds per clip is also blocked by Omni Flash's hard cap. Finally, teams needing text reasoning, coding assistance, or agentic tool use should use Gemini 3.5 Flash or Gemini 3.1 Pro instead, since Omni produces no text completions and has no function calling.

Gemini Omni

Gemini Omni: Google's New AI Video World Model (2026)

About Gemini Omni

Pricing

Key Features

Pros

Cons

Frequently Asked Questions