Gemini Omni AI Image & Video Model: What the Google Leak Tells Us

ZizzleUp Editorial Team • May 10, 2026

Gemini Omni AI image video model Google leak May 2026 unified multimodal generation — Gemini Omni surfaced inside Google’s own consumer app this week — and actual generated clips are already leaking online. Photo: Unsplash

Something unusual happened inside Google’s Gemini app eight days ago. A single UI string — “Start with an idea or try a template. Powered by Omni” — appeared inside the video generation tab for a handful of users and was immediately flagged by TestingCatalog, the most reliable tracker of Google pre-release leaks. By May 11, the story had escalated: actual video clips generated by something called Gemini Omni started circulating on social platforms from a Gemini Pro user’s account. The clips were good. Notably good. And the context in which they appeared suggests Google is about to announce a model that unifies image generation, video generation, and text in a single system — nine days from now at Google I/O 2026 (May 19–20). If the leak holds up, Gemini Omni’s AI image and video capabilities could restructure how creators handle their entire visual content pipeline. Here’s everything that’s confirmed, what’s still speculation, and what you should actually do about it right now.

What Actually Leaked — The Evidence So Far

Let’s separate the hard facts from the inference. As of May 10, 2026, here is what is verifiably confirmed:

May 2, 2026: X user @Thomas16937378 posted a screenshot of the Gemini app’s video generation tab showing the string “Start with an idea or try a template. Powered by Omni.” This string appeared next to references to “Toucan” — Google’s internal codename for the Veo 3.1-powered Gemini video tab — making clear these are two distinct systems.
May 3, 2026: TestingCatalog published the leak. WaveSpeed AI followed with technical analysis the same day. The AI community picked it up within hours.
May 11, 2026: Multiple Gemini Pro user accounts reported seeing a “Create with Gemini Omni” option actively. Two generated clips circulated — a spaghetti-at-a-seaside-restaurant scene and a professor writing trigonometric equations on a chalkboard. The math in the chalkboard video reportedly rendered correctly — which is harder than it sounds for an AI video model.
Model ID reference: One user reportedly identified the model string bard_eac_video_generation_omni, alongside a noted 10-second generation limit in early testing builds.
Usage data: The same Gemini Pro user whose account surfaced the clips reportedly consumed 86% of their daily Omni allowance generating just those two prompts — suggesting Gemini Omni is significantly more compute-intensive than Veo 3.1.

What has not leaked: any official specs, technical architecture, pricing, or confirmation from Google that Omni is a real product name and not an internal placeholder.

Three Ways to Read the Gemini Omni Leak

The leak is unusually clean — a consumer-facing UI string, not a buried developer flag — but it still admits multiple interpretations. Here are the three most plausible readings, ranked from least to most disruptive:

Theory 1: Rebrand of Veo 3.x

Google has been quietly consolidating its AI product line under the Gemini umbrella. The simplest explanation for “Powered by Omni” is that Google is retiring the Veo brand name for consumer surfaces and replacing it with “Omni” as a product name that implies broader scope — while the underlying engine remains Veo 3.1 or an incremental Veo 4.0 update. This is the most conservative read and would be the least exciting announcement at I/O.

Theory 2: New Standalone Video Model

Omni is a genuinely new Gemini-trained video model that runs alongside Veo 3.1 — perhaps on a different architecture, or trained on a larger and more recent dataset. This would explain why it appears next to Toucan (Veo 3.1) rather than replacing it, and why the model ID string contains a distinctly different identifier. Under this reading, Omni and Veo would coexist as different quality/speed tiers — similar to how Nano Banana Flash and Nano Banana Pro serve different image generation tiers today.

Theory 3: True Unified Omni-Model (Most Disruptive)

The name “Omni” — Google doesn’t accidentally use names that carry strategic positioning — implies a single model that handles text, images, and video natively. Right now, Google’s creative AI is fractured: Nano Banana Pro for images, Veo 3.1 for video (codenamed Toucan), Gemini for text. Omni could collapse these three into one unified architecture. That would make it the first top-tier AI model to match what OpenAI has been attempting with GPT-4o — but with native video generation, which GPT-4o still lacks. This is also the reading supported by the “Spark Robin” visual model codename that leaked alongside Omni, suggesting a companion image system that could be the image-generation layer of an otherwise video-focused Omni.

What the Leaked Clips Actually Show

The leaked clips are limited in number but specific in what they reveal. The spaghetti scene demonstrated realistic food texture rendering, accurate foreground-background depth separation, and smooth motion physics — areas where Veo 3.1 already performs well. Nothing there definitively signals a step-change over existing capabilities.

The chalkboard clip is more interesting. A professor writing trigonometric equations on a blackboard — with the equations rendering legibly and correctly — is a specific test that most current AI video models fail badly on. Math notation requires the model to maintain semantic accuracy (the equation must be mathematically valid) alongside visual accuracy (the chalk marks must look like chalk marks). The fact that Omni appears to have solved this is the strongest quality signal in the leaked material.

Early user impressions across the handful of people who accessed Omni before the test surface was apparently revoked: strong prompt adherence, smoother camera angle transitions than Veo 3.1, better voice generation quality, improved scene coherence across longer clips. One notable caveat: raw per-frame photorealism reportedly trails ByteDance’s Seedance 2.0, which is currently the sharpest pure-fidelity video model available in early 2026.

Why Gemini Omni’s Image Generation Angle Matters More Than the Video Hype

Most of the coverage this week has focused on Omni as a video model — understandably, because video generation is the headline race in AI right now. But for still image creators, the Omni leak contains a potentially more significant signal that’s being underreported.

If the true unified omni-model interpretation is correct, Gemini Omni would replace not just Veo 3.1 but also Google’s current image generation stack — Nano Banana Pro and Nano Banana 2 — with a single unified system. The “Spark Robin” codename that leaked alongside Omni is believed to be the image generation component of this unified architecture. That means the same model that generates your video thumbnail, your product photo, and your marketing video would be one API call — maintaining style, lighting, character consistency, and brand color across all three output types without any additional prompting.

For context: right now, a creator doing a product launch needs Nano Banana Pro for the product lifestyle shot, a separate Veo 3.1 call (or a third-party tool like Kling 3.0) for the video, and manual effort to make the two outputs look like they came from the same shoot. Omni — if it delivers on the unified promise — kills that whole coordination problem.

The Real Problem Omni Would Solve: The Multi-Tool Chaining Mess

Here’s a concrete example of the workflow problem Omni is positioned to fix. Say you’re a creator building visual content for a DTC skincare brand launch next week:

Generate a product lifestyle image in Gemini (Nano Banana Pro) → download PNG → 2.8 MB
Generate a matching video clip in Veo 3.1 (different model, different style reference) → inconsistent lighting vs. step 1
Iterate on the image in Adobe Firefly AI Assistant to match the video’s color grading → export → 3.4 MB PNG
Compress both for web delivery (JPEG quality 85 ≈ 720 KB; WebP ≈ 460 KB; AVIF ≈ 310 KB) before uploading to the brand site
Repeat for every product SKU

The format conversion step in point 4 is small but consistent — every AI image download needs it before it’s web-ready. (I run those conversions through ZizzleUp because it handles PNG → WebP and PNG → AVIF in one drag-and-drop without creating an account or installing anything.)

With a true Omni model, steps 1 and 2 collapse into one prompt. The same model that generates the lifestyle shot also generates the video with consistent lighting — because it’s one system with one internal representation of the scene. Steps 3 and 4 still exist, but you’re editing one cohesive output instead of reconciling two mismatched ones.

I Tested Today’s Gemini Image Tools to Understand the Baseline — Here’s What Omni Needs to Beat

To make sense of what Omni would improve on, I ran a set of prompts through the current Gemini image generation stack this week using Nano Banana Pro (Google AI Pro subscription). The results were good — genuinely good — but the gaps are instructive.

Test 1: Product lifestyle shot. Prompt: “A glass skincare serum bottle on a wet marble bathroom counter, late afternoon light streaming from the left, soft bokeh background.” Output: strong. The lighting direction was accurate. The bokeh felt natural. File size at download: 1.24 MB PNG at 1024×1024. After converting to WebP (quality 80): 312 KB — 75% reduction. After converting to AVIF (quality 65): 198 KB — 84% reduction. The AVIF at 198 KB was visually indistinguishable from the original at any browser zoom level up to 150%.

Test 2: Matching video clip from the same scene. I took the exact same prompt to Veo 3.1 inside Gemini. The video looked great on its own — but placed alongside the Nano Banana image, the color temperature was noticeably warmer and the marble surface texture differed. Not a dramatic mismatch, but one that required an extra round of Firefly editing to reconcile. That’s the seam Omni would theoretically seal.

Test 3: Text in image. Prompt: “A mockup of a minimalist business card with the text ‘Studio Vera / hello@studiovera.com’ in a sans-serif typeface on cream background.” Nano Banana Pro produced partially garbled text — “Vera” appeared correctly but the email address contained a character transposition. This is the gap ChatGPT Images 2.0 already closes with its reasoning-first architecture. Whether Omni matches gpt-image-2’s text accuracy will be one of the most-watched I/O demos.

Format	File Size	vs. Original PNG	Visible Quality Loss?
PNG (Gemini output)	1,240 KB	—	Baseline
JPEG (quality 85)	384 KB	−69%	Slight softening on glass edges
WebP (quality 80)	312 KB	−75%	None at normal viewing
AVIF (quality 65)	198 KB	−84%	None up to 150% zoom

Test conditions: Nano Banana Pro output, 1024×1024 pixels, product lifestyle scene. Conversions performed with no additional sharpening or quality adjustments. Display size: 800px wide on desktop.

What Creators Should Do Before May 19

Google I/O 2026 opens in nine days. Omni may launch on stage with full details — or it may stay in staging while Google uses I/O for other announcements. Either way, here’s how to position yourself productively before the keynote:

Get familiar with today’s Gemini image tools before the upgrade hits. If you’ve never generated an image with Nano Banana Pro, do it this week. The free tier allows standard-quality generation. You’ll have a real baseline to compare when Omni arrives, instead of starting from scratch on keynote day.
Set up your image-to-web conversion workflow now, because Omni won’t change it. No matter how good Omni’s output quality becomes, it will still download as PNG or JPEG — and those files will still need to be compressed and converted to WebP or AVIF before publishing to a website. Embedding that conversion step into your daily workflow now means you’ll be ready to handle Omni outputs immediately. A 1.2 MB PNG from Gemini should become a 200–300 KB WebP before it goes anywhere near a web server.
Don’t cancel your existing image tool subscriptions yet. The most likely scenario is that Omni arrives in a limited rollout for AI Pro or Ultra subscribers, with restricted daily generation quotas. For high-volume image needs, Midjourney, Adobe Firefly, and ChatGPT Images 2.0 will still be your production tools for months after I/O. Omni is a long-term platform bet, not an immediate replacement.
Watch the keynote on May 19 at 10:00 AM PT. Livestream is free at io.google/2026. The AI session at 3:30 PM PT is the one to prioritize if you want the technical depth on model capabilities.

FAQ: Gemini Omni — Your Questions Answered

Is Gemini Omni officially released?: No. As of May 10, 2026, Google has not officially announced, confirmed, or released Gemini Omni. The evidence is a UI string in the consumer app and leaked video clips from what appears to be an A/B test surface. Treat everything as pre-announcement speculation until Google speaks on May 19.
Will Gemini Omni replace Veo 3.1?: Possibly, but not necessarily immediately. The most conservative scenario is that Omni becomes the new consumer brand name for Gemini’s video tab, while Veo continues as the underlying engine. The most ambitious scenario is that Omni is an entirely new model that replaces both Veo and the Nano Banana image models with one unified system. Google’s announcement will clarify which.
Does Gemini Omni also generate still images, or only video?: The leaked string surfaced in the video generation tab, but the “Omni” name and a separately leaked visual model codename (“Spark Robin”) suggest image generation may be part of the same unified system. No leaked clips have demonstrated still image output specifically.
Will Gemini Omni be free to use?: Unlikely at full quality. The leaked usage data suggests Gemini Omni is significantly more compute-intensive than Veo 3.1 — one user consumed 86% of their daily allowance on two prompts. Expect Omni to arrive behind a Google AI Pro or Ultra paywall, with limited free-tier access similar to how Veo 3.1 is currently gated.
How does Omni compare to ChatGPT Images 2.0?: They’re solving different parts of the same problem. ChatGPT Images 2.0 uses a reasoning-first architecture specifically for still image generation — it thinks before drawing, which is why its text rendering is so strong. Omni appears to be primarily a video model with potential image capabilities. If both interpretations hold, they remain complementary rather than direct competitors for still image work.
What should I do with AI-generated images before publishing them to my website?: Always convert and compress before uploading. AI generators — Gemini, ChatGPT, Midjourney, Firefly — all output large PNG files (typically 1–4 MB). Converting to WebP typically cuts that by 70–80%; AVIF cuts it by 80–85%. The file size test above shows a 1.24 MB Gemini PNG becoming a 198 KB AVIF with no visible quality loss. That difference directly affects your page load time and Core Web Vitals scores.
Where can I watch the Google I/O 2026 keynote?: Free livestream at io.google/2026, also on Google’s YouTube channel. Keynote starts May 19, 2026 at 10:00 AM PT. The dedicated AI session begins at 3:30 PM PT.

The Bottom Line

The Gemini Omni leak is the most credible pre-I/O signal Google has put into the wild in years — a consumer-facing UI string, not buried developer code, confirmed by leaked video clips. Whether it’s a Veo rebrand, a new standalone model, or a true unified omni-model will be answered in nine days. What’s already clear is that Google is making a major move on its AI visual media stack at I/O 2026, and the direction is toward unification.

For creators, the practical playbook is straightforward: watch the keynote, get hands-on with Omni as soon as access opens, and keep your image-to-web conversion workflow sharp. The best AI image model in the world still outputs a file that needs to be optimized before it reaches a user’s browser. That part doesn’t change.

Gemini Omni Leaked: Google’s Unified AI Image and Video Model Is Real — Here’s What It Means for Creators