Gemini Omni Image to Video: What Creators Need to Know Today

ZizzleUp Editorial Team • May 20, 2026

Gemini Omni image to video AI tool Google IO 2026 creator workflow SynthID watermark — Gemini Omni — Google’s new multimodal model announced at I/O 2026 — can generate video from any image you give it. Here’s what you actually need to know before using it. Photo: Unsplash

Yesterday at Google I/O 2026, Google announced Gemini Omni’s image-to-video capability — and by the time most people woke up this morning, it was already rolling out. If you have a Google AI Plus, Pro, or Ultra subscription, you can open the Gemini app right now, upload a photo, describe a scene, and watch it turn into a 10-second video clip with synchronized audio. That’s genuinely fast for a feature rollout of this scale.

I’ve spent the past few hours testing it, and it works the way Google showed in the keynote. But there are some things the demo glossed over — specifically around how the output images and frames actually behave, what the SynthID watermark means for anyone planning to use this commercially, and a practical issue that will trip up every creator who jumps in without thinking: the quality of your starting image matters enormously, and most people don’t prepare their source files well enough before uploading.

This isn’t a hype piece. It’s a practical rundown of what’s actually happening, what works, what doesn’t yet, and how to get the most out of Gemini Omni without making the mistakes that are already showing up in early user threads.

What Gemini Omni Image-to-Video Actually Does

Gemini Omni is a new model family from Google DeepMind — announced yesterday, rolling out today. The first model in the family is Gemini Omni Flash. At launch, it’s focused specifically on video generation and editing, taking any combination of image, audio, video, and text as input and producing video output.

The image-to-video workflow is straightforward: you upload a photo (a product shot, a portrait, a landscape, a drawing — anything), write a description of the scene or motion you want, optionally add audio instructions, and Omni generates a 10-second video clip. The audio is generated alongside the video in the same pass — not added separately — which is what makes it different from earlier tools like Veo 3.1.

The 10-second clip limit is a deployment decision, not a model limitation. Google confirmed this in the I/O media briefing. Longer clips will likely follow in future updates. For now, 10 seconds is what you get.

Where Omni goes beyond image-to-video is in what Google calls “world grounding” — the model understands physics, lighting, and anatomy, so generated motion tends to look physically plausible. In the keynote demo, a claymation explainer on protein folding showed accurate stop-motion physics. The chalkboard math demo that leaked a few weeks ago (which we covered when Omni was still a rumour) is now confirmed to be part of Omni’s capability set.

Who Has Access Right Now

Gemini Omni Flash is available today in two places:

Gemini app and Google Flow: For Google AI Plus ($8/mo), Pro ($19.99/mo), and Ultra ($249.99/mo) subscribers. This gives you the full image-to-video experience with all input combinations and output download options.
YouTube Shorts and YouTube Create app: Available at no cost to all users this week. Limited to video creation within those apps — you can’t download the raw output for use elsewhere.

API access for developers and enterprise customers comes in the coming weeks. If you’re building an application that needs programmatic image-to-video generation, you’ll need to wait — but it’s worth noting that Omni will be accessible through the standard Gemini API endpoint rather than a separate product.

Google has not published per-generation credit costs for Omni yet. Based on the leaked data from last week — one early tester consuming 86% of their daily allowance on two prompts — expect Omni to be significantly more credit-intensive than standard Nano Banana image generation. Budget accordingly if you’re planning to use this at volume.

The SynthID Watermark: What You Need to Know Before Using This Commercially

This is the part most of the early coverage is glossing over. Every image and video generated by Gemini Omni — and by any Google AI product — is automatically watermarked with SynthID. This isn’t a visible logo. It’s a cryptographic signal embedded directly into the pixel data of the image or video.

SynthID survives format conversion. If you export a frame from an Omni-generated video as a JPEG, convert it to WebP, compress it, resize it — the SynthID signal remains detectable because it’s woven into the pixel values, not stored as removable metadata. This is the same infrastructure OpenAI adopted earlier this year under the C2PA standard. Google confirmed at I/O that SynthID watermarks are now verifiable through the Gemini app, Chrome, and Google Search — and that when you publish content on social media, the watermark metadata is preserved.

What this means practically: If you’re using Gemini Omni outputs in commercial campaigns — paid ads, client deliverables, marketing materials — you need to confirm that your client’s contract and legal setup is comfortable with AI-generated content bearing a verifiable AI origin signal. This isn’t a dealbreaker for most uses. But it’s information you need to have before the invoice goes out.

Also worth noting: the EU AI Act’s Article 50 transparency obligations become enforceable on August 2, 2026. At that point, content published in EU-facing markets must have machine-readable AI disclosure. SynthID satisfies that requirement for video outputs. For still images extracted from Omni-generated video, the signal persists in the pixel data — which means your extracted frames are already disclosure-compliant for EU purposes, without any additional action on your part.

Why Your Source Image Quality Matters More Than You’d Think

The most underreported aspect of image-to-video tools — Omni included — is how much the quality of the input image affects the quality of the output video. This isn’t obvious until you’ve burned a few credits on results that weren’t worth using.

The model uses your uploaded image to infer lighting direction, material properties, depth cues, and physical constraints before generating motion. A poorly lit, compressed, or low-resolution photo gives the model less to work with — and the motion generation ends up less physically accurate as a result.

Specifically, here’s what tends to go wrong with poor source images:

Heavy JPEG compression artifacts (quality below 75) get interpreted as surface texture by the model. You end up with a video where the product or subject has an unnatural, slightly mottled appearance because the AI is faithfully animating what it thinks is surface detail — but is actually encoding noise.
Low-resolution images (under 800px on the short side) produce softer, less detailed video frames. The model can’t generate detail that isn’t there in the first place.
Inconsistent or harsh lighting in the source image confuses the motion generation for reflective surfaces. Glass products and metallic objects are particularly prone to this — the reflection patterns shift unnaturally when Omni generates motion because the original lighting was ambiguous.

The fix is simple: prepare your source images properly before uploading. This takes less time than one failed generation.

I Tested It: Three Photos, Three Very Different Results

I ran three different image types through Gemini Omni Flash this morning using the same motion prompt: “gentle movement, camera slowly pulls back, soft ambient lighting, 10 seconds.” This prompt works reliably across different scenes and gave me a controlled comparison.

Test 1: A product photo (candle on a marble surface) — JPEG at quality 90, 1200×1200px.
Result: Strong. The flame animation was physically accurate — the flickering followed realistic fluid dynamics. The marble reflections shifted naturally as the virtual camera moved back. The audio generated an appropriate ambient room tone with subtle wax crackling. This is exactly the kind of output a DTC brand would want for social content. The generated video downloaded as an MP4 at 4.8 MB for 10 seconds.

Test 2: The same product shot — JPEG at quality 60, 800×800px (the kind of compressed image you’d find in most CMS libraries).
Result: Noticeably worse. The candle surface had a slight grain throughout the motion that wasn’t in the original scene — the compression artifacts were being animated as if they were part of the material. The marble looked patterned rather than smooth. Still watchable, but not commercially usable without re-editing.

Test 3: A smartphone photo taken in good daylight, saved as PNG, 4032×3024px.
Result: Surprisingly strong for a phone image. The PNG preserved colour accuracy and the model used it well. However, uploading a 12 MB PNG is slow, and the processing time was notably longer than either JPEG test. The generation quality was comparable to Test 1, not significantly better — suggesting that above a certain resolution threshold, you’re not gaining output quality, just upload time.

The sweet spot, based on this morning’s testing: JPEG at quality 85–90, between 1000–2000px on the long side. Large enough for the model to work with, compressed enough to upload fast, clean enough to not introduce encoding artifacts into the generated motion.

Real File Size Comparison: The Thumbnail Frames You’ll Extract and Publish

Here’s something you’ll run into almost immediately: when you want to share an Omni-generated video on your website or embed it in a blog post, you’ll want a thumbnail or preview still from the video. Gemini delivers the video as MP4, but most sites also need a JPEG or WebP preview image. That frame extraction creates a PNG — and that PNG needs to be optimized before it goes near a web server.

I extracted the first frame from the Test 1 video (the candle shot) and ran it through a standard web format comparison. The extracted frame was 1920×1080 at full video resolution.

Format / Settings	File Size	vs Raw Frame	Quality (800px display)
PNG (raw frame from Omni)	3,140 KB	—	Reference
JPEG @ quality 85	412 KB	−87%	Very slight edge softening
WebP @ quality 80	318 KB	−90%	No visible difference
AVIF @ quality 65	204 KB	−94%	No visible difference

Test: 1920×1080 PNG frame extracted from Gemini Omni Flash output, displayed at 800px wide in desktop Chrome. Conversions done at stated quality settings, no additional sharpening.

A 3.1 MB PNG thumbnail from an AI video is a Largest Contentful Paint problem waiting to happen. The 204 KB AVIF version looks identical at any normal display size. For the frame conversion, I used ZizzleUp — it handles PNG to WebP and PNG to AVIF in one step without signing up for anything.

How to Prepare Your Images Before Uploading to Gemini Omni

Based on this morning’s testing, here’s the exact preparation workflow that gives you the best generation results with the least friction:

Start with your best-quality original. If you have a RAW or high-quality TIFF from your camera, that’s your starting point. If you’re working from a smartphone photo, use the original file from your camera roll — not a screenshot or a version that’s already been through social media compression.
Resize to 1200–2000px on the long side. Larger doesn’t help Omni — it just slows your upload. The model doesn’t generate video at your source resolution; it works from a downsampled version internally. Staying in the 1200–2000px range hits the quality threshold without padding your file size.
Save as JPEG at quality 85–92. This gives you a clean file with no blocking artifacts, at a manageable size. If your original is PNG and you want to preserve transparency for a product cutout shot, keep it as PNG — but expect a larger upload.
Check your lighting before uploading, not after. If the source image has mixed colour temperatures or harsh shadows that cut across your main subject, fix those before generating. Lightroom’s one-click auto-tone is usually enough to even out a rough exposure. Trying to prompt your way around a problematic light source rarely works.
For web publishing: always convert after, not before. Use your high-quality original for the upload to Omni, then optimize whatever you export for web use. A frame extracted from the output video should go through WebP or AVIF conversion before it touches your site. A 3 MB PNG thumbnail will fail your Core Web Vitals every time.

FAQ: Gemini Omni Image-to-Video

What’s the difference between Gemini Omni and Veo 3.1?: Veo 3.1 is Google’s dedicated video generation model. Omni is a new model family designed to handle text, images, audio, and video as inputs in a single system — it’s not just a Veo update, it’s a different architectural approach. Nicole Brichtova from Google DeepMind described Omni as “the next step toward combining the intelligence of Gemini with the rendering capabilities of our media models.” In practice for users: Omni generates more contextually grounded video with native audio, while Veo 3.1 remains available in Google Flow for more controlled filmmaking workflows.
Are Gemini Omni videos watermarked?: Yes, always. Every output from Gemini Omni includes Google’s SynthID digital watermark. This is an invisible, pixel-level signal that persists through format conversion and compression. It’s verifiable through the Gemini app, Chrome, and Google Search. Individual frames extracted from Omni videos carry the same watermark. For EU-facing commercial content, this actually helps with AI Act compliance. For sensitive brand campaigns, confirm your usage terms before deploying.
Can I use Gemini Omni outputs commercially?: Google’s general terms for Workspace and Gemini allow commercial use of AI-generated outputs. However, the full Gemini Omni terms haven’t been published separately yet. Check the latest Gemini usage policies before using outputs in paid advertising, client deliverables, or published editorial content. The SynthID watermark being preserved in social media uploads is worth flagging to clients before using Omni in their campaigns.
Why is my Gemini Omni video blurry compared to the demo?: Almost always a source image issue. The most common culprits are JPEG compression below quality 75, images under 800px, or source photos with heavy noise from underexposed shots. Try the same prompt with a cleaner, higher-quality source image and the result will usually be noticeably sharper.
What format does Gemini Omni output?: MP4 for video. If you extract still frames for use as thumbnails or preview images, those will typically come out as PNG — and PNG at 1920×1080 can easily be 2–4 MB. Before putting those on a website, convert them to WebP or AVIF. A 3 MB PNG thumbnail will hurt your Core Web Vitals; a 200 KB AVIF of the same frame won’t.
Is Gemini Omni free?: Partially. YouTube Shorts and the YouTube Create app have free access this week, but you can’t export the raw MP4 from those surfaces. Full access — including download and multi-input generation — requires a Google AI Plus ($8/mo), Pro ($19.99/mo), or Ultra ($249.99/mo) subscription.
How does Omni compare to Kling 3.0 or Runway Gen-4.5?: Early impressions suggest Omni’s native audio generation is ahead of both. Kling 3.0 produces longer clips (up to 3 minutes vs Omni’s 10-second Flash limit) and currently has better Elo benchmark scores for photorealism. Runway Gen-4.5 gives more precise creative control for professional filmmaking workflows. Omni’s main advantage right now is the combination of native audio + Gemini’s world knowledge grounding — it generates contextually accurate content, not just visually plausible motion.

Final Thoughts

Gemini Omni’s image-to-video capability is real, it works today, and the physics-grounded generation is noticeably better than what earlier tools produced. For creators who already live in the Google ecosystem, the integration into the Gemini app and Google Flow is a genuine workflow improvement.

The two things worth taking seriously before you start generating at scale: understand the SynthID watermark implications for your specific use case, and spend two minutes preparing your source images properly before you upload them. The model is only as good as what you give it.

And once you’ve generated something good — whether it’s a still image from Nano Banana or a frame extracted from an Omni video — don’t skip the format conversion step before publishing. A 3 MB PNG thumbnail from a 10-second AI video is still a 3 MB PNG thumbnail. That part doesn’t fix itself.

Gemini Omni Image to Video Is Now Live — Here’s What Creators Actually Need to Know Before Using It