Google I/O 2025 was never about subtlety. This year, the company abandoned incrementalism, delivering a cascade of generative AI upgrades that aim to redraw the map for search, video, and digital creativity.
The linchpin: Gemini, Google’s next-gen model family, is now powering everything from search results to video synthesis and high-resolution image creation—staking out new territory in a race increasingly defined by how fast, and how natively, AI can generate.
The showstopper is Veo 3, Google’s first AI video generator that creates not just visuals, but complete soundtracks—ambient noise, effects, even dialogue—synchronized directly with the footage. Text and image prompts go in, and fully-produced 4K video comes out.
This marks the first large-scale video model capable of generating audio and visuals simultaneously—a trend that began with Showrunner Alpha, an unreleased model, but Veo3 offers far more versatility, generating various styles beyond simple 2D cartoon animations.
“We’re entering a new era of creation with combined audio and video generation,” Google Labs VP Josh Woodward said during the launch. It’s a direct challenge to current video generation leaders—Kling, Hunyuan, Luma, Wan, and OpenAI’s Sora—positioning Veo as an all-in-one solution rather than requiring multiple tools.
Alongside Veo3, Imagen 4—Google’s latest iteration of its image generator model—arrives with enhanced photorealism, 2K resolution, and perhaps most importantly, text rendering that actually works for signage, products, and digital mockups.
For anyone who’s suffered through the gibberish text created by previous AI image models, Imagen 4 represents a significant improvement.
These tools don’t exist in isolation. Flow AI, a new subscription feature for professional users, combines Veo, Imagen, and Gemini’s language capabilities into a unified filmmaking and scene-editing environment. But this integration comes at a price—$125 per month to access the complete toolkit as part of a promotional period until the full $250 price starts to be charged.

Image: Google
Gemini: Powering search and “text diffusion”
Generative AI isn’t just for content creators. Gemini 2.5 now forms the backbone of the company’s redesigned search engine, which Google wants to evolve from a link aggregator into a dynamic, conversational interface that handles complex queries and delivers synthesized, multi-source answers.
AI overviews—where Google Gemini attempts to provide comprehensive answers to queries without requiring users to click through to other sites—now sit at the top of search pages, with Google reporting over 1.5 billion monthly users.

Image: Google via Youtube
Another interesting development is “Gemini Diffusion,” built with technology pioneered by Inception Labs months ago. Until recently, the AI community generally agreed that autoregressive technology worked best for text generation while diffusion technology excelled for images.
Autoregressive models generate each new token after reading all previous generations to determine the best next token—ideal for crafting coherent text responses by constantly reviewing the prompt and prior output.
Diffusion technology operates differently, starting with filling all the context with random information and refining (diffusing) the output each step to make the final product match the prompt—perfect for images with fixed canvases and aesthetics.
OpenAI first successfully applied autoregressive generation to image models, and now Google has become the first major company to apply diffusion generation to text. This means the model begins with nonsense and refines the entire output with each iteration, producing thousands of tokens per second while maintaining accuracy—for context, Groq (not xAI’s Grok), which is one of the fastest inference providers in the world, generates near 275 tokens per second, and traditional providers like OpenAI or Anthropic cannot come close to those speeds.

The model, however, isn’t publicly available yet—interested users must join a waiting list—but early adopters have shared impressive results showing the model’s speed and precision.
Google Gemini Diffusion is crazy
the handfeel of 2sec responses is jaw dropping
you must try it
realtime video: pic.twitter.com/F06CosXV2v
— Kickiniteasy (@kickiniteasy) May 21, 2025
Hands-on with Google’s AI tools
We got our hands on several of Google’s new AI features, with mixed results depending on the tier.
Deep Research is particularly powerful—even beating ChatGPT’s alternative. This comprehensive research agent evaluates hundreds of sources and delivers reliable information with minimal errors.
What gives it an edge over OpenAI’s research agent is the ability to generate infographics. After producing a complete research text, it can condense that information into visually appealing slides. We fed the model everything about Google’s latest announcement, and it presented accurate information through charts, schemes, graphs, and mind maps.

Veo 3 remains exclusive to Gemini Ultra users, though some third-party providers like Freepik and Fal.ai already offer access via API. Flow isn’t available to try unless you spring for the Ultra plan.
Flow proves to be an intuitive video editor with Veo’s models at its core, allowing users to edit, cut, extend, and modify AI scenes using simple text prompts.
However, even Veo2 got a little love, which is making life easier for Pro users. Generations with the now-accessible Veo2 are significantly faster—we created 8 seconds of video in about 30 seconds. While Veo2 lacks sound and currently only supports text-to-video (with image-to-video coming soon), it understood our prompts and even generated coherent text.
Veo2 already performs comparably to Kling 2.0—widely considered the quality benchmark in the generative video industry. The new generations with Veo3 seem to be even more realistic, coherent, with good background sound and lifelike dialogue and voices.
NO WAY. It did it. And, was that, actually funny?
Prompt:
> a man doing stand up comedy in a small venue tells a joke (include the joke in the dialogue) pic.twitter.com/LrCiVAp1Bl— fofr (@fofrAI) May 20, 2025
For Imagen, it’s difficult to determine at first glance whether Google incorporates version 4 or still uses version 3 on its Gemini chatbot interface, though users can confirm this through Whisk. Our initial tests suggest Imagen 4 prioritizes realism unless specified otherwise, with better prompt adherence and visuals that surpass its predecessor.
We generated an image with different elements that don’t usually fit together in the same scene. Our prompt was “Photo of a woman with a skin made of glass, surrounded by thousands of glitter and ethereal pieces in a baroque room with the word ‘Decrypt’ written in neon, realistic.”
Even though both Imagen 3 and Imagen 4 understood the concept and the elements, Imagen 3 failed to capture the realistic style—which Imagen 4 easily did. Overall, Imagen 4 is comparable to the SOTA image generators, especially considering how easy it is to prompt.

Audio overviews have also improved, with models now easily providing over 20 minutes of full debates on Gemini instead of forcing users to switch to NotebookLM. This makes Gemini a more complete interface, reducing the fragmentation that previously required users to jump between different sites for various services.
The quality is comparable to that of NotebookLM, with slightly longer outputs on average. However, the key feature is not that the model is better, but that it is now embedded into Gemini’s chatbot UI.

Premium AI at a premium price
Google didn’t hide its monetization strategy. The company’s “Ultra” plan costs $250 monthly, bundling priority access to the most powerful models, Flow AI tools, and 30 terabytes of storage—clearly targeting filmmakers, serious creators, and businesses. The $20 “AI Pro” tier unlocks Google’s previous Veo2 model, along with image and productivity features for a broader user base. Basic generative tools—like simple Gemini Live and image creation—remain free, but with limitations like a token cap and only 10 researches per month.
This tiered approach mirrors the broader AI market trend: drive mass adoption with freebies, and then lock in the professionals with features too useful to pass up. Google’s bet is that the real action (and margin) is in high-end creative work and automated enterprise workflows—not just casual prompts and meme generation.
Edited by Andrew Hayward
Leave a Reply